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Abstract 



In the last few years many real-world networks have been found to show a so-called community structure organization. 
.Much effort has been devoted in the literature to develop methods and algorithms that can efficiently highlight this 
OA 'hidden structure of the network, traditionally by partitioning the graph. Since network representation can be very 
complex and can contain different variants in the traditional graph model, each algorithm in the literature focuses on 
'some of these properties and establishes, explicitly or implicitly, its own definition of community. According to this 
definition it then extracts the communities that are able to reflect only some of the features of real communities. The 
'aim of this survey is to provide a manual for the community discovery problem. Given a meta definition of what a 
community in a social network is, our aim is to organize the main categories of community discovery based on their 
own definition of community. Given a desired definition of community and the features of a problem (size of network, 
direction of edges, multidimensionality, and so on) this review paper is designed to provide a set of approaches that 
'researchers could focus on. 
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■1. Introduction 

A complex network is a mathematical model of inter- 
"action phenomena that take place in the real world, which 
lias revealed a powerful computational basis for the anal- 
ysis of such phenomena. One critical problem, which has 
■been widely studied in the literature since the early anal- 
ysis of complex networks, is the identification of commu- 
nities hidden within the structure of these networks. 

A community is intuitively understood as a set of enti- 
ties where each entity is closer, in the network sense, to the 
lother entities within the community than to the entities 
outside it. Therefore, communities are groups of entities 
that presumably share some common properties and/or 
■play similar roles within the interacting phenomenon that 
is being represented. Community detection is important 
for many reasons, including node classification which en- 
tails homogeneous groups, group leaders or crucial group 
connectors. Communities may correspond to groups of 

S.ges of the World Wide Web dealing with related topics 
, to functional modules such as cycles and pathways in 
metabolic networksH Q , to groups of related individuals 
in social networks [J| and so on. 

Community discovery has analogies to the clustering 
problem, a traditional data mining task. In data mining. 
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clustering is an unsupervised learning task, which aims 
to partition large sets of data into homogeneous groups 
(clusters). In fact, community discovery can be viewed as 
a data mining analysis on graphs: an unsupervised clas- 
sification of its nodes. In addition, community discovery 
is the most studied data mining application on social net- 
works. Other applications, such as graph mining Q, are 
still in an early phase of their development. Instead com- 
munity discovery has achieved a more advanced develop- 
ment with contributions from different fields, such as sta- 
tistical physics. 

Nevertheless, this is only part of the community discov- 
ery problem. In classical data mining clustering, we have 
data that is not in a relational form. Thus, in this general 
form, the fact that the entities arc nodes connected to each 
other through edges has not been thoroughly explored. 
Spatial proximity needs to be mapped to network proxim- 
ity between entities represented as vertices in a graph. 

The most accepted definition of proximity in a net- 
work is based on the topology of its edges. In this case 
the definition of community is formulated according to 
the differences in the densities of links in different parts of 
the network. Many networks have been found to be non- 
homogeneous, consisting not of an undifferentiated mass 
of vertices, but of distinct groups. Within these groups 
there are many edges between vertices, but between groups 
there are fewer edges. The aim of a community detection 
algorithm is, in this case, to divide the vertices of a net- 
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work into some number k of groups, while maximizing the 
number of edges inside these groups and minimizing the 
number of edges estabhshed between vertices in different 
groups. These groups are the desired communities of the 
network. 

This definition reveals vague and unprecise as the com- 
plexity of network representations increases and novel an- 
alytical settings emerge, such as information propagation 
or multidimensional network analysis. For example, in a 
temporal evolving setting, two entities can be considered 
close to each other if they share a common action profile 
even if they are not directly connected. Often times, a 
novel approach to community discovery is designed to face 
a specific problem and it has developed its own definition 
of community. 

In addition to the variety of different definitions of 
community, communities have a number of interesting fea- 
tures. They can exhibit a hierarchical or overlapping con- 
figuration of the groups inside the network. Or else the 
graph can include directed edges, thus giving importance 
to this direction when considering the relations between 
entities. The communities can be dynamic, i.e., evolving 
over time, or multi- relational, i.e., there could be multiple 
relations and sets of individuals that behave as isolated en- 
tities in each relation of the network, thus forming a dense 
community when considering all the possible relations at 
the same time. 

As a result this extreme richness of definitions and fea- 
tures has lead to the publication of an impressive number 
of excellent solutions to the community discovery problem. 
It is therefore not surprising that there are a number of 
review papers describing all these methods, such as Q. 

We believe that a new point of view is needed to orga- 
nize the body of knowledge about community detection, 
shifting the focus from how communities are detected to 
what kind of communities are we interested to detect. Ex- 
isting reviews tend to analyze the different techniques from 
a procedural perspective. They cluster the different algo- 
rithms according to their operational method, not accord- 
ing to the definition of community they adopt in the first 
place. Nevertheless, there are many different ways to con- 
ceive a community within a network, as acknowledged also 
by 0, where authors maintain that "[all the methods] re- 
quire us to know what we are looking for in advance before 
we can decide what to measure" — here "know what we 
are looking for" clearly means to define what a community 
really is. To use a metaphor, existing reviews talk about 
the bricks and mortar that make up a building with no 
mention about its architectural style. In other words, the 
aim of the previous reviews is to talk to people interested 
in building a new community detection algorithm, rather 
than those who want to use the methods presented in the 
literature. Our aim is precisely the latter. 

We have thus chosen to cluster the community dis- 
covery algorithms by considering their reference definition 
of what is a community, which depends on what kinds 
of groups they aim to extract from the network. For 



each algorithm we record the characteristics of the out- 
put of the method, thus highlighting which sets of fea- 
tures the reviewed algorithm is suitable or not suitable 
for. We also consider some general frameworks that pro- 
vide both a community discovery approach and a general 
technique. These are applicable to other graph partition- 
ing algorithms by adding new features to these other meth- 
ods. 

The paper is organized as follows. In Section 2 we pro- 
vide a general definition of the community discovery prob- 
lem and the meta definition of what a community is. In 
Section 3 we explain the classification of algorithms based 
on community definitions. Then, from Sections 4 to 12, we 
present the main categories of approaches given our prob- 
lem definition, along with what we consider to be the most 
important works in each given category. In Section 13 we 
provide various evaluation measurements over a collection 
of reviewed methods on a benchmark graph. Section 14 
reviews some other related works, reviews regarding com- 
munity discovery in social networks, along with the ratio- 
nale behind the novel approach to these methods provided 
in this paper. Finally, Section 15 concludes the survey and 
provides an approach to possible future work. 

2. Problem Definition 

2.1. Problem Representation 

Let us assume that we have a graph G denoted by a 
quadruple G — {V,E,L,G), where F is a set of labeled 
nodes, E is a set of labeled edges, L is a set of edge labels 
and C is a set of node labels. E is a set of quadruples of the 
form {u, V, I, w) where u,v G V are nodes, I € L is a label 
and w is an integer that represents the weight of the rela- 
tion. We assume that given a pair of nodes u,v G V and a 
label I e L only one edge (u, v, I, w) may exist; however the 
direction of the edge is considered in the model, thus edges 
{u,v,l,w) and {v,u,l,w) are considered distinct. We also 
assume that each node can be labeled with one or more 
category c € C. In addition, we consider the temporal evo- 
lution of the network. Thus each edge, and node, can be 
labeled with an arbitrary number of timestamps that rep- 
resent the time in which the edge appears and disappears 
in the network. The labels of a given node can also change 
over time. Note that nodes can create/delete edges in the 
network and/or change/introduce/delete one or more la- 
bels in their category set. We call such events "actions" 
that are performed by the nodes. 

With this complex model we can represent all possible 
variants in a graph of a complex real world phenomenon. 
For example, we can model multi-relational networks by 
considering the edge labels L as the different relations (di- 
mensions) of the network. We can also represent simpler 
models, such as unweighted networks, by assigning the 
same weight w = 1 to every edge in the network. Hereafter 
we will use the notation presented in Table I. We intro- 
duce new symbols and notations when they are needed for 
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Symbol 


Description 


n 


Number of vertices of the network 


m 


Number of edges of the network 


k 


Number of communities of the network 


K 


Avg degree of the network 


K 


Max degree in the network 


T 


Number of action in the network 


A 


Max number of actions for a node 


D 


Number of dimensions (if any) 


c 


Number of vertex types (if any) 


t 


Number of time step (if any) 



Table 1: Resume of the main notation used in the paper. 

the presentation of one particular method but not useful 
for the others. 

2.2. Community Meta Definition 

We will now present our meta definition of a commu- 
nity in a complex network. With this meta definition we 
create an underlying concept which is the basis behind 
this survey and includes all the possible definition vari- 
ants present in the literature. 

Meta Definition 1 (Community). A community in a 
complex network is a set of entities that share some closely 
correlated sets of actions with the other entities of the com- 
munity. Here we consider direct connection as a particu- 
lar, and very important, kind of action. 

The aim of a community discovery algorithm is to iden- 
tify these communities in the network. The desired result 
is a list of sets of entities grouped together. Starting from 
this meta definition we can model the main aspects of dis- 
covering communities in complex networks. 

Density-based definitions. In this classical setting, 
as we mentioned in the Introduction, the definition is en- 
tirely based on the topology of the network edges. The 
community is defined as a group in which there are many 
edges between vertices, but between groups there are fewer 
edges. The aim of a community detection algorithm is to 
divide the vertices of a network into some number k of 
groups, while maximizing the number of edges inside these 
groups and minimizing the number of edges that run be- 
tween vertices in different groups. In our definition we 
consider the connection between two vertices a particular 
kind of action. Hence, if we group entities by maximizing 
their common actions, we also group them by maximizing 
the edges inside the community. Community discovery is 
exactly the same if the edge creation is the only action 
recorded in the network representation. In addition, by 
considering different kinds of sets of action in the meta 
definition, we can also model the overlapping situation: for 
certain sets of actions (i.e. connections) a node belongs to 
one community, for another set of actions, it belongs to 
another community. 




(c) Weighted Communities 



Figure 1; Different community features. 

Vertex similarity-based definitions. As pointed 
out by Fortunato Q, it is natural to assume that com- 
munities are groups of vertices that are similar to each 
other. One can compute the similarity between each pair 
of vertices with respect to some reference property, local or 
global, irrespectively of whether or not they are connected 
by an edge. Each vertex ends up in the cluster whose ver- 
tices are the most similar to it. By considering an evolving 
setting in our problem representation, together with the 
presence or absence of a particular property (i.e. a label 
of the vertex), we can model the similarity measures as 
the similarity of the set of actions. 

Action-based definitions. In this setting, which is 
gaining increasing attention in the literature, entities can 
be grouped by the set of actions they perform inside the 
network. For example, in Q a multi-mode network is con- 
sidered in which users are connected to queries and ads. 
Two users are seen as being part of the same community if 
they are connected to the same queries (i.e. they perform 
the same actions) even if they are not directly linked to 
each other. The discovery of communities based on this 
definition can be performed considering or not the presence 
of a direct link between entities. Both cases are included 
in our meta definition. 

Influence Propagation-based definitions. In some 
works, the concept of a "tribe" has been introduced. In 
, a tribe is defined as a set of entities that are influenced 
by the same leaders. A node is a leader if it has performed 
an action and, within a chosen time bound after this ac- 
tion, a sufficient number of other users have performed the 
same action. The role of social ties in this infiuence spread 
is considered. Thus, according to our definition, the set of 
users that frequently perform the same actions due to the 
influence of their leaders are considered as being a com- 
munity. 

2.3. Problem Features 

There are many features to be considered in the com- 
plex task of detecting communities in graph structures. 
In this section we present some of the features an analyst 
may be interested in for discovery network communities. 
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We will use them to evaluate the reviewed algorithms in 
Table 2 and also to motivate our classification in Section 
11 

Table 2 records the main properties of a community 
discovery algorithm. These properties can be grouped into 
two classes. The first class considers the features of the 
problem representation, the second the characteristics of 
the approach. 

Within the first class of features we group together all 
the possible variants in the representation of the original 
real world phenomenon. The most important features we 
consider are: 

• Overlapping. In some real world networks, com- 
munities can share one or more common nodes. For 
example, in social networks actors may be part of dif- 
ferent communities: work, family, friends and so on. 
All these communities will share a common member, 
and usually more since a work colleague can also be 
a friend outside the working environment. Figure 



1(a) shows an example of possible overlapping com- 



munity partitions: the central node is shared by the 
two communities. Table 2 indicates if an algorithm 
considers this feature in the "Overlap" column. 

• Directed. Some phenomena in the real world must 
be represented with edges and links that are not re- 
ciprocal. This, for example, is the case of the web 
graph: a hyperlink from one page to another is di- 
rected and the other page may not have another hy- 



perlink pointing in the other direction. Figure 1(b) 
shows an example in which the direction of the edges 
should be considered. The leftmost node is con- 
nected to the community, but only in one direction. 
If reciprocity is an important feature, the leftmost 
node should be considered outside the depicted com- 
munity. See "Dir" column in Table 2. 

• Weighted. A group of connected vertices can be 
considered as a community only if the weights of 
their connections are strong enough, i.e. over a given 



threshold. In the case of Figure 1(c) the left group 



might not be strong enough to form a community. 
See "Weight" column in Table 2. 

• Dynamic. Following our problem representation in 
Section 12.11 in our setting we have a set of edges 
that can appear and disappear. Thus, communities 
might also evolve over time. See "Dyn" column in 
Table 2. 

The second class of features collects various desired 
properties that an approach might have. These features 
can specify constraints for input data, improve the ex- 
pressive power of the results or facilitate the community 
discovery task. 

• Parameter free. A desired feature of an algorithm, 
especially in data mining research, is the absence of 



parameters. In other words, an algorithm should be 
able to make explicit the knowledge that is hidden 
inside the data without needing any further infor- 
mation from the analyst regarding the data or the 
problem (for instance the number of communities). 
See "NoPar" column in Table 2. 

• Multidimensional input. Multidimensionality in 
networks is an emerging topic [13, [HI, [l2|- ^ net- 
work is said to be multidimensional if it contains a 
number of different kinds of relations that are es- 
tablished between the nodes of the network. Thus, 
when dealing with multiple dimensions, the notion of 
community changes. Our proposed Meta Definition 
[1] captures this complex environment by representing 
the creation or the absence of a particular edge in a 
particular dimension with an action. This concept 
of multidimensionality is used (with various names: 
multi-relational, multiplex, and so on) by some ap- 
proaches as a feature of the input considered by the 
approach. See "MDim" column in Table 2. 

• Incremental. Another desired feature of an algo- 
rithm is its ability to provide an output without an 
exhaustive search of the entire input. An incremen- 
tal approach to the community discovery is to clas- 
sify a node in one community by looking only at its 
neighborhood, or the set of nodes two hops away. 
Alternatively newcomers are put in one of the previ- 
ously defined communities without starting the com- 
munity detection process from the beginning. See 
"Incr" column in Table 2. 

• Multipartite input. Many community discovery 
approaches work even if the network has the partic- 
ular form of a multipartite graph. The multipartite 
graph, however, is not entirely a feature of the input 
that we might want to consider for the output. Many 
algorithms often use a (usually) bipartite projection 
of a classical graph in order to apply efficient compu- 
tations. As in the case of multidimensionality, this 
is the reason for including the multipartite input as 
a feature of the approach and not of the output. See 
"Multip" column in Table 2. 

There is one more "meta feature" that we consider. 
This is the possibility of applying the considered approach 
to another community discovery technique by adding new 
features to the "guest method" . This meta feature will be 
highlighted with an asterisk next to the algorithm's name. 

Table 2 also has a "Complexity" column that gives 
the time complexity of the methods presented. The two 
"BES" columns give the Biggest Experiment Size, in terms 
of nodes ("BESn") and edges ("BESm"), that are included 
in the original paper reviewed. Note that the Complexity 
and BES columns often offer an evaluation of the actual 
values, since the original work did not provide an explicit 
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and clear analysis of the complexity or their experimen- 
tal setting. A question mark indicates where evaluating 
the complexity would not be straightforward, or where no 
experimental details are provided. 

3. The Definition-based classification 

We now review community detection approaches. In 
each section we group together all the algorithms that 
share the same definition of what a community is, i.e. the 
same conditions satisfied by a group of entities that allow 
them to be clustered together in a community. 

This classification is the main contribution of the paper 
and it should help to get a higher level view of the universe 
of graph clustering algorithms, by uncovering a practical 
and reasoned point of view for those analysts seeking to 
obtain precise results in their analytical problems. 

The proposed categories are the following: 

• Feature Distance (Sectional). Here we collect all 
the community discovery approaches that start from 
the assumption that a community is composed of 
entities which ubiquitously share a very precise set of 
features, with similar values (i.e. defining a distance 
measure on their features, the entities are all close 
to each other). A common feature can be an edge 
or any attribute linked to the entity (in our problem 
definition: the action). Usually, these approaches 
propose this community definition in order to apply 
classical data mining clustering techniques, such as 
the Minimum Description Length principle 13|, [14 [ . 



• Internal Density (Section[5]). In this group we con- 
sider the most important articles that define commu- 
nity discovery as a process driven by directly detect- 
ing the denser areas of the network. 

• Bridge Detection (Section [6]). This section in- 
cludes the community discovery approaches based on 
the concept that communities are dense parts of the 
graph among which there are very few edges that 
can break the network down into pieces if they are 
removed. These edges are "bridges" and the com- 
ponents of the network resulting from their removal 
are the desired communities. 

• Diffusion (Section [7|). Here we include all the ap- 
proaches to the community discovery task that rely 
on the idea that communities are groups of nodes 
that can be influenced by the diffusion of a certain 
property or information inside the network. In ad- 
dition, the community definition can be narrowed 
down to the groups that are only influenced by the 
very same set of diffusion sources. 

• Closeness (Section (S]) . A community can also be 
defined as a group of entities that can reach each of 
its own community companions with very few hops 



on the edges of the graph, while the entities outside 
the community are significantly farther apart. 

Structure (Section IH]) . Another approach to com- 
munity discovery is to define the community exactly 
as a very precise and almost immutable structure of 
edges. Often these structures are defined as a com- 
bination of smaller network motifs. The algorithms 
following this approach define some kinds of struc- 
tures and then try to find them efficiently inside the 
graph. 



• Link Clustering (Section [TO)l . This class can be 
viewed as a projection of the community discovery 
problem. Instead of clustering the nodes of a net- 
work, these approaches state that it is the relation 
that belongs to a community, not the node. There- 
fore they cluster the edges of the network and thus 
the nodes belong to the set of communities of their 
edges. 

• No Definition (Section [T^). There are a number of 
community discovery frameworks which do not have 
a basic definition of the characteristic of the commu- 
nity they want to explore. Instead they define vari- 
ous operations and algorithms to combine the results 
of various community discovery approaches and then 
use the target method community definition for their 
results. Alternatively, they let the analyst define his 
/ her own notion of community and search for it in 
the graph. 

In each section we clarify which features in a particu- 
lar community discovery category of the ones presented in 
the previous section are derived naturally, and which fea- 
tures are naturally difficult to achieve. We are not formally 
building an axiomatic approach, such as the one built in 
(isj l for spatial clustering. Instead, we are using the fea- 
tures presented and an experimental setting to make the 
rationale and the properties of each category in this clas- 
sification more explicit. The experiments made to support 
this point are presented in Section 1131 

Where possible, we also provide a simple graphical ex- 
ample of the definition considered. This graphical example 
will provide the main properties of the given classification, 
in terms of the strong and weak points in particular com- 
munity features. 

The aim of this survey is to focus on the most re- 
cent approaches and on the more general definitions of 
community. We will not focus on historical approaches. 
Some examples of classical clustering algorithms that have 
not been extensively reviewed are the Kernighan-Lin algo- 
rithm [l^ or the classical spectral bisection approach [13] • 
Thus, for a historical point of view of the community dis- 
covery problem, please refer to other review papers. 

3.1. The Classification Overlap 

There is a sort of overlap for some community defini- 
tions. For example a definition of internal density may 
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also include communities with sparse external links, i.e. 
bridges. We will see in Section [S] that in this definition 
a key concept is modularity 18|. Modularity is a qual- 
ity function which considers both the internal density of 
a community and the absence of edges between commu- 
nities. Thus methods based on modularity could be clus- 
tered in both categories. However, the underlying defini- 
tion of modularity focuses on the internal density, which is 
the reason for the proposed classification. To give another 
example, a diffusion approach may detect the same com- 
munities whose members can reach each other with just a 
few hops. However this is not always the case: the diffu- 
sion approach may also find communities with an arbitrary 
distance between its members. 

Many approaches in the literature do not explicitly de- 
fine the communities they want to detect or, worse, they 
generically claim that their aim is to find dense modules 
of the network. This is not a problem for us, since the 
underlying community definition can be inferred from a 
high-level understanding of the approach described in the 
original paper. One cannot expect researchers to be able 
to categorize their method before an established catego- 
rization has been accepted. To instigate a discussion re- 
garding this issue is one of the aims of this paper. Once 
further knowledge regarding the field has been established, 
authors will be able to correctly categorize their approach. 

In order to gain stronger evidence of the differences be- 
tween the proposed categories, consider Figures [2j |4l [71 [9j 
[T^ and [131 These figures depict the simplest typical com- 
munities that have been identified from the definitions of 
Feature Distance, Internal Density, Bride Detection, Diffu- 
sion, Closeness and Structure Definition, respectively. As 
can be seen, there are a number of differences between 
these examples. The Bridge Detection example (Figure [T]) 
is a random graph, thus with no community structure de- 
fined for the algorithms in the Internal Density category. 
The Diffusion example (Figure [S]) is also a random graph, 
however although the diffusion process identifies two com- 
munities, no clear bridges can be detected. 

The overlap is due to the fact that many algorithms 
work with some general "background" meta definitions of 
community. The categories proposed here can be clus- 
tered together into a hierarchy with the four main cate- 
gories described in Section [2.21 Further, many algorithms 
may present common strategies in the exploration of the 
search space or in evaluating the quality of their partition 
in order to refine it. Consider for example 19] and (20j . 
In these two papers there is a thorough theoretical st udy 
concerning modularity and its most general form. In [l9[, 
for example, the authors were able to derive modularity as 
a random walk exploration strategy, thus highlighting its 
overlap with the algorithms clustered here in the "Close- 
ness" category. 

Evaluating the overlap and the relationships between 
the most important community discovery approaches is 
not simple, and is outside the scope of this survey. Here 
we focus on the connection between an algorithm and its 



particular definition of community. Thus we can create 
our useful high-level classification to connect the needs of 
particular analyses (i.e. the community definitions) to the 
tools available in the literature. To study how to derive 
one algorithm in terms of another, thus creating a graph of 
algorithms and not a classification, is an interesting open 
issue which we will leave for future research. 

4. Feature Distance 

In this section we review the community discovery meth- 
ods that define a community according to this meta defi- 
nition: 

Meta Definition 2 (Feature Community). A feature 
community in a complex network is a set of entities that 
share a precise set of features (including the edge as a fea- 
ture). Defining a distance measure based on the values of 
the features, the entities inside a community are very close 
to each other, more than the entities outside the commu- 
nity. 

This meta definition operates according to the follow- 
ing meta procedure: 

Meta Procedure 1. Given a set of entities and their at- 
tributes (which may be relations, actions or properties), 
represent them as a vector of values according to these at- 
tributes and thus operate a matrix/spatial clustering on the 
resulting structure. 

Using this definition the task of finding communities is 
very similar to the classical clustering problem in data min- 
ing. In data mining, clustering is an unsupervised learning 
task. The aim of a clustering algorithm is to assign a large 
set of data into groups (clusters) so that the data in the 
same clusters are more similar to each other than any other 
data in any other cluster. Similarity is defined through a 
distance measure, usually based on the number of com- 
mon features of the entities, or on similar values of these 
attributes. 



An example of the clustering technique is K- means [56[ . 
One natural clustering approach to the community dis- 
covery is some evolutions of co-clustering 57, 58] and/qr 
some spectral approaches to the clustering problem [59[. 
In l60ll there is a survey on co-clustering algorithms, while 
in [l5| there is an interesting axiomatic framework for spa- 
tial clustering. Given the rich literature and methods to 
cluster matrices, in this category community discovery ap- 
proaches may find clusters with virtually any feature we 
presented. Table 2 illustrates this by looking at the high 
entropy of the features set for all methods present in this 
category. Given the fact that each node and edge is rep- 
resented by a set of attributes, it is very easy to obtain 
multidimensional and multi-partite results by simply clus- 
tering it in a complex multidimensional space. 

In order to understand the downsides of this category, 
consider Figure [H which depicts a network whose nodes 
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Name 


Overlap 


Dir 


Weight Dyn 


NoPar 


MDim Incr 


Multip 


Complexity 


BESn 


BESm 


Year 


Ref 




Evolutionary* 






/ 




/ 






5k 


7 


2006 


[21] 




MSN-BD 






/ 






/ 


0{n'^ck) 
0{n'^ logn)* 
0{mn'^) 


6k 


3M 


2006 


[22] 


CD 
O 


SocDim 


/ 




/ 




/ 




80k 


6M 


2009 


[23] 


c5 


PMM 






/ 




/ 




15k 


27M 


2009 


[24] 




MRGC 




/ 


/ 




/ 


/ 
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nity discovery that formulates the ground theory for the 
MDL community detection), the Context-specific cluster 
tree 0, and Timefall 

4-1. Evolutionary* fil l 



Figure 2: An example of a graph that can be partitioned with a 
notion of "distance" between its nodes. 



are positioned according to a distance measure. This mea- 
sure could consider the direct edge connection, however it 
is not mandatory. The nodes are then grouped into the 
same community if they are close in this space (which 
may be highly dimensional depending on the number of 
features considered). Figure [5] shows that, depending on 
the number of node/edge attributes, the underlying graph 
structure may lose importance. This may lead to counter- 
intuitive results if the analyst tries to display the clusters 
by only looking at the graph structure, thus resulting in 
a lot of inter-community edges. We will discuss this point 
further in Section [T^ 

Here we focus on some clustering techniques with some 
very interesting features: the Evolutionary clustering 21| : 
RSN-BD 0, a k-partite graph based approach; MRGC 
p5], that is a clustering technique working with tensors; 
two approaches that use modularity for the detection of 
latent dimensions for a multidimensional community dis- 
covery with a machine learning classifier that maximizes 
the number of common features (|^] and [13]); a Bayesian 
approach to clustering based on the predictability of the 
features for nodes belonging to the same group [2^ ; and an 
analysis of the shared attribute connections in a bipartite 
graph entity-attribute [27]. 

An interesting clustering principle is the Minimum De- 
scription Length principle [ISj, In MDL the main 
concept is that any regularity in the data (i.e. common 
features) can be used to compress it, i.e. to describe it 
using fewer symbols than the number of symbols needed 
to describe the data literally (see also [glj and [S^l)- The 
more regularities there are, the more the data can be com- 
pressed. This is a very interesting approach since, in some 
implementations, it enables the community discovery to 
be performed without setting any parameters. After con- 
sidering the classical clustering approaches, in this section 
we also present three main algorithms that implement a 
MDL community discovery approach: Autopart f28] (that 
is, to the best of our knowledge, the first popular commu- 



In [21[ the authors tackle the classical clustering prob- 
lem by adding a temporal dimension. This novel situation 
includes several constraints: 

• Consistency. Any insights derived from a study of 
previous clusters are more likely to apply to future 
clusters. 

• Noise Removal. Historically consistent clustering 
provides greater robustness against noise by taking 
previous data points into effect. 

• Smoothing. The true clusters shift over time. 

• Cluster Correspondence. It is generally possible 
to place today's clusters in relation to yesterday's 
clusters, so the user will still be situated within the 
historical context. 

In order to consider these constraints, two clustering 
division measures are defined: snapshot quality and his- 
tory cost. The snapshot quality of Ct, a proposed clus- 
ter division, measures how well Ct represents the data at 
time-step t. The history cost of the clustering is a measure 
of the distance between Ct and Ct-i, the clustering used 
during the previous time-step. 

This setting is similar to incremental clustering, but 
with some differences, [g^ . There are two main differences. 
First, the focus is on optimizing a new quality measure 
which incorporates a deviation from history. Secondly, it 
works on-line (i.e. it must cluster the data during time- 
step t before seeing any data for time-step t + 1), while 
other frameworks work on data streams [64]. 

This framework can be added to any clustering algo- 
rithm. The time complexity will be 0{v?), particularly on 
the agglomerative hierarchical clustering, used for the ex- 
amples in the original paper, althou gh s ome authors claim 
that a quasi-linear implementation ;65!] is possible. How- 
ever, the framework is presented here because it is possible 
to apply its principles to all the other community discovery 
algorithms presented in this survey. 

There are two framework applications worth noting. 
The first is FacetNet [i^ , in which a framework to evaluate 
the evolution of the communities is developed. The second 
one is [stI ]. in which the concepts of nano-communities 
and k-clique-by-clique are introduced. These concepts are 
useful for assessing the snapshots and historical quality of 
the communities identified in various snapshots with any 
given method. 
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4.2. RSN-BD lii] 

RSN-BD (Relation Summary Network with Bregman 
Divergence) is a community discovery approach focused on 
examples of real- world data that involve multiple types of 
objects that are related to each other. A natural represen- 
tation of this setting is a k-partite graph of heterogeneous 
types of nodes. This method is suitable for general k- 
partite graphs and not only special cases such as [68i] . The 
latter has the restriction that the numbers of clusters for 
different types of nodes must be equal, and the clusters 
for different types of objects must have one-to-one associ- 
ations. 

The key idea is that in a sparse k-partite graph, two 
nodes are similar when they are connected to similar nodes 
even though they are not connected to the same nodes. 
In order to spot this similarity, authors produce a derived 
structure (i.e. a projection) to make these two node closely 
connected. In order to do this, the authors of [111 add a 
small number of hidden nodes. This derived structure is 
called a Relation Summary Network and must be as close 
as possible to the original graph. They can evaluate the 
distance between the two structures by linking every orig- 
inal node with one hidden node and every hidden node 
couple if both hidden nodes are linked by the same origi- 
nal node. The distance function then sums up all the Eu- 
clidean distances between the weights of the edges in the 
original graph and in the transformed graph (any Bregman 
divergence distance function can be used). A Bregman 
divergence defines a class of distance measures for which 
neither the triangle inequality, nor symmetry, is respected, 
and these measures are defined for matrices, functions and 
distributions [g^ . The total complexity of the algorithm, 
as discussed by the authors, is 0{n?ck). 

13. MRGCfM] 

In this model, each relation between a given set of en- 
tity classes is represented as a multidimensional tensor (or 
data cube) over an appropriate domain, with the dimen- 
sions associated with the various entity classes. In addi- 
tion, each cell in the tensor encodes the relation between 
a particular set of entities and can either take real values, 
i.e., the relation has a single attribute, or itself is a vector 
of attributes. 

The general idea is that each node and each relation is 
a collection of attributes. All these attributes are a dimen- 
sion of the relational space. MRGC (Multi-way Relation 
Graphs Clustering), basically tries to find a solution on one 
dimension at a time. It finds the optimal clustering with 
respect to each dimension by keeping every other interme- 
diate result on the other dimensions fixed (thus its time 
complexity is given by the number of relations times the 
number of dimensions, i.e. 0{mD)). It then evaluates the 
solutions and keeps recalculating over all dimensions until 
it converges. Although defined for relation graphs, this 
model can be also used for identify community structures 
in social networks. 



MRGC operates in a multi-way clustering setting where 
the objective is to map the set of entities in a (smaller) set 
of clusters by using a set of clustering functions (i.e. it 
is a general framework in which previous co-clustering ap- 
proaches, such as 0, can be viewed as special cases). The 
crucial mechanism in this problem is how to evaluate the 
quality of the multi-way clustering in order to get to the 
convergence. In this case, the authors propose to measure 
it in terms of the approximation error or the expected 
Bregman distortion j71|] between the original tensor and 
the approximate tensor built after applying the clustering 
function. 



4.4. SocDim 12S 

One basic (Markov) assumption in community discov- 
ery is frequently that the label of a node is only dependent 
on the labels of all its neighbors. SocDim tries to go be- 
yond this assumption by building a classifier which not 
only considers the connectivity of a node, but assigns ad- 
ditional information to its connection i.e. a description of 
a likely affiliation between social actors. This information 
is called latent social dimensions and the resulting frame- 
work is based on relational learning. 

In order to do this, two steps are performed by SocDim. 
Firstly, it extracts latent social dimensions based on net- 
work connectivity. It uses modularity (Section [5]) in or- 
der to find in the structure of the network the dimensions 
in which the nodes are placed (following the homophily 
theory which states that actors sharing certain proper- 
ties tend to form groups (ill)- This can usually be done 
in O(n^logn). This step may be replaced if there is al- 
ready knowledge of the social dimensions. Secondly, it 
constructs a discriminative classifier (one-vs-rest linear [73| 
or structural [zlj SVM): the extracted social dimensions 
are considered as normal features (including other possi- 
ble sources) in the classical supervised learning task. It is 
then possible to use the predicted labels of the classifier 
to reconstruct the community organization of the entities. 
This is a multidimensional community discovery because 
the classifier will determine which dimensions are relevant 
to a class label. 

This work is the basis of a further evolution |75i| that 
has an edge-centric view of communities (similar to the 
methods classified in Section [TT 



4.5. PMMIM] 

This work was originally presented in [76] and then 
evolved in 2J|. It presents a variation of the modularity 
approach on a multidimensional setting. The goal of the 
PMM (Principal modularity Maximization) algorithm is: 
given a lot of different dimensions, find a concise represen- 
tation of them (the authors call this step "Structural Fea- 
ture Extraction" , computing modularity with the Lanczos 
method. The latter is an algorithm to find eigenvalues 
and eigenvectors of a square matrix fn\, of complexity 
0{mn^)) and then detect the correlations between these 
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representations (in the "Cross-Dimension Integration" , us- 
ing a generalized canonical correlation analysis [zl] ) . 

After this step, the authors obtain lower-dimensional 
embedding, which captures the principal pattern across 
all the dimensions of the network. They can then perform 
k-means ^5d\ on this embedding to find out the discrete 
community assignment. 

4-6. Infinite Relational f2^] 

Suppose there are one or more relations (i.e. edges) 
involving one or more types (i.e. nodes). The goal of 
the Infinite Relational Model is to partition each type into 
clusters (i.e. communities) , where a good set of partitions 
allows relationships between entities to be predicted by 
their cluster assignments. The authors' goal is to orga- 
nize the entities into clusters that relate to each other in 
predictable ways, by simultaneously clustering the entities 
and the relations. 

Formally, suppose that the observed data are m rela- 
tions involving n types. Let i?* be the ith relation, 
be the jth type, and be a vector of cluster assign- 
ments for . The task is to infer the cluster assignments, 
and the ultimate interest lies in the posterior distribution 

P{zi, Zn \ Ri, Rm)- 

To enable the IRM to discover the number of clusters in 
type T, the authors use a prior [t^ that assigns some prob- 
ability mass to all possible partitions of the type. Infer- 
ences can be made using Markov chain Monte Carlo meth- 
ods to sample from the posterior on cluster assignments. 
This method has a very high time complexity {0{n^''D)). 



4.7. Find- Tribes ]21 

Find- Tribes was not explicitly developed for commu- 
nity discovery purposes. However, the technique can still 
be used to identify some kind of community. It is very close 
to our "action" definition of a community: the entities in 
a group tend to behave in the same way. 

As input, the authors require a bipartite graph G = 
{R U A, E) of entities R and attributes A. The entities 
should connect to several attributes. The aim of the algo- 
rithm is to return those groups sharing "unusual" combi- 
nations of attributes. This restriction can be easily gener- 
alized in order to also obtain the "usual" groups as out- 
puts. 

The strategy for the desired task revolves around the 
development of a good definition of "unusual" . For an 
entity group to be considered anomalous, the shared at- 
tributes themselves need not be unusual, but their partic- 
ular configuration should be. A projected non-bipartite 
graph H'{R, F) is built, then for each edge a score aj (the 
number of attributes in the shared sequence, the number 
of time steps of overlap, a probabilistic Markov chain of 
attributes and so on) is computed, measuring how signifi- 
cant or unusual its sequence of shared attributes is. In the 
end a threshold d is chosen and all edges fij removed for 
which Cij < d are removed. The connected components 



(a) The original matrix 



(b) Reordered matrix 



Figure 3: An example of the MDL principle for matrices: the matrix 
on the left is exactly the same matrix as the one on the right, but 
reordered in order to describe it simply. 



of H' are the desired tribes and the overall complexity is 

4.8. AutoPart fM] 

Autopart is the basic formulation of the MDL approach 
to the community discovery problem. There is a binary 
matrix that represents associations between the n nodes of 
the graph (and their attributes). An example of a possible 



adjacency matrix is shown in Figure 3(a) 



The main idea is to reorder the adjacency matrix so 
that similar nodes, i.e. nodes that are connected to the 
same set of nodes, are grouped with each other. The adja- 
cency matrix should then consist of homogeneous rectan- 
gular/square blocks of a high (low) density, representing 
the fact that certain node groups have more (less) connec- 
tions with other groups (right hand side of Figure [3(b)[ ), 
which can be encoded with a great compression of the data. 
The aim of the algorithm is to identify the best grouping 
that minimizes the cost (compression) function 80]. 

A trade-off point must therefore be identified that in- 
dicates the best number of groups k. The authors solved 
this problem using a two-step iterative process: first, they 
find a good node grouping G for a given number of node 
groups k that minimize entropy; and second, they search 
for the number of node groups k by splitting the previ- 
ously identified groups and verifying if there is a possible 
gain in the total encoding cost function, at a total time 
complexity of 0{mk^). 

4-9. Context- specific Cluster Tree fsdl] 

In this variant of the MDL approach, a binary x 
matrix represents a bipartite graph with rig source nodes 
and rid destination nodes. The aim is to automatically con- 
struct a recursive community structure of a large bipartite 
graph at multiple levels, namely, a Context-specific Clus- 
ter Tree (CCT). The resulting CCT can identify relevant 
context-specific clusters. The main idea is to subdivide the 
adjacency matrix into tiles, or "contexts", with a possible 
reordering of rows and columns, and to compress them, ei- 
ther as-is (if they are homogeneous enough) or by further 
subdividing. 

The entire graph is considered as a whole community. 
If the best representation of the considered (sub)graph is 
the random graph, by testing its possible compression with 
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Figure 4: An example of a graph which can be partitioned with a 
notion of internal density between its nodes. 



a total encoding cost function, then the community cannot 
be spht into two sub-communities. In fact, by definition 
the random graph has no community structure at all. Oth- 
erwise, the graph is split and the algorithm is reapplied 
recursively. Each edge is visited once for each subdivision 
(thus the complexity is 0{mk)). The result is a tree of 
communities in which the bottom levels are a context spe- 
cialization of the generic communities at the top of the 
tree. 

This idea of recursive clustering is also applied to stream- 
although with a number of parameters. 
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mg setting 

This is a hierarchical evolution of the existing flat method 
described in [HI]. 

4.10. Timefall fM] 

Timefall is an MDL approach that can be described 
as a parameter-free network evolution tracking. Given n 
time-stamped events each related to several of m items, 
it simultaneously finds (a) the communities, that is, item- 
groups (e.g., research topics and/or research communities) 
and (b) a description of how the communities evolve over 
time (e.g., appear, disappear, split, merge), and (c) a se- 
lection of the appropriate cut-points in time when existing 
community structures change abruptly. 

The adjacency matrix representing the graph is split 
according to the row timestamps. Columns are then clus- 
tered with a Cross Association algorithm [58| , which is the 
basis of the MDL community discovery algorithms. The 
MDL principle is used again to connect the column clus- 
ters of the matrices across the split rows: if two column 
clusters can be encoded together with a low encoding cost 
then they are connected, ignoring time points with little 
or no changes. The time complexity is equal to 0{mk). 

5. Internal Density 

For this group of approaches, the underlying meta def- 
inition is: 



Meta Definition 3 (Dense Community). A dense com- 
munity in a complex network is a set of entities that are 
densely connected. In order to be densely connected, a 
group of vertices must have a number of edges significantly 
higher than the expected number of edges in a random 
graph (which has no community structure). 

The following meta procedure is generally shared by 
the algorithms in this category: 

Meta Procedure 2. Given a graph, try to expand or col- 
lapse the node partitions in order to optimize a given den- 
sity function, stopping when no increment is possible. 

Figure U] shows a network in which the identified com- 
munities are significantly denser than a random graph with 
the same degree distribution. 

A key concept for satisfying this meta definition is 
modularity [83|. Briefly, consider dividing the graph into 
c non-overlapping communities. Let Ci denote the com- 
munity membership of vertex Vi, ki represents the degree 
of vertex i. Modularity is like a statistical test in which 
the null model is a uniform random graph model. In this 
model one entity connects to others with uniform proba- 
bility. For two nodes with degree ki and kj respectively, 
the expected number of edges between the two in a uni- 

kk 

form random graph model is ' , where m is the number 

2m 

of edges in the graph. Modularity measures how far the 
interaction deviates from a uniform random graph with 
the same degree distribution. It is defined as: 



Q 



2m ^ 



A,. 



ki kj 
2m 



S{ci,Cj), 



where S{ci,Cj) = 1 if q = Cj (i.e. the two nodes are 
in the same community), and otherwise, and Aij is the 
number of edges between nodes i and j. A larger modular- 
ity indicates a denser within-group interaction. Note that 
Q could be negative if the vertices are split into bad clus- 
ters. Q > indicates that the clustering captures some 
degree of community structure. Essentially, the aim is to 
find a community structure such that Q is maximized. 

Modularity is involved in the community discovery prob- 
lem on two levels. Firstly, it can quantify how good a given 
network partition is. It gives a result of the quality of the 
partition even without any knowledge of the actual com- 
munities of the network. This is especially suitable for 
very large networks. On the other hand, modularity is not 
the perfect solution for evaluating a proposed community 
partition. It suffers from well known problems, in partic- 
ular the resolution problem. Modularity fails to identify 
communities smaller than a scale that depends on the total 
size of the network and on the degree of interconnectedness 
of the communities, even in cases where modules are un- 
ambiguously defined. Furthermore, with modularity only 
communities extracted according to the meta definition 
proposed in this section can be evaluated. Any other kind 
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of definition of communities will result in a not so mean- 
ingful evaluation by applying modularity. For an extensive 
review of the known problems of modularity see 113] • 

The second level of the modularity usage in the graph 
partitioning task is represented by community discovery 
algorithms that are based on modularity maximization. 
These algorithms suffer from the aforementioned problems 
of the usage of modularity as quality measures. However, 
modularity maximization is a very prolific field of research, 
and there are many algorithms relying on heuristics and 
strategies for finding the best network partition. 

We will present the main example of a modularity- 
based approach, providing references for minor modularity 
maximization algorithms. A good review of the eigenvec- 
tor modularity based work is in [85]). 

Modularity is not the only cost function that is able 
to quantify whether a set of entities is more related than 
expected and thus can be considered as a community. The 
other reviewed methods that rely on different techniques, 
but share the same meta definition of community proposed 
in this section, are: MetaFac [3l[, a hypergraph factor- 
ization technique; a physical-chemical algorithm using a 
Bayesian approa ch l32l| : a local density-based approach 
called LA — >■ IS^ [33[ ; and another proposed function used 
to measure the internal local density of a cluster ^] . 

Optimizing a density function is suitable for many graph 
representations such as directed graphs and weighted graphs. 
However in addition to modularity problems, there are 
other weak points. For example, more complex structures 
are not tractable in this approach such as multidimen- 
sional networks. If multiple different qualitative relations 
are present in a network, how should a consistent value of 
"multirelational density" be computed? There are some 
works that scratch the surface of the ambiguity of den- 
sity in multidimensional networks 86[, however given the 
current situation none of these approaches can be used in 
pure multidimensional settings. 

5.1. Modularity fBt] 

To find a partition that provides the maximum value 
of modularity is an NP-complete problem. Many greedy 
heuristics have therefore been proposed. After a pioneer- 
ing work proposing modularity [87], Newman presented 
an efficient strategy for modularity maximization, namely 
repeatedly merging the two communities whose amalga- 
mation produces the largest increase in Q. This produces 
a dendrogram representing the hierarchical decomposition 
of the network into communities at all levels, which must 
be cut in the modularity peak in order to obtain the com- 
munities, as depicted in Figure [5j 

Figure [S] also shows another problem of modularity 
maximization heuristics. It has been discovered that mod- 
ularity does not have a single peak given all the possible 
partitions, but there are several local optima. Moreover, 
real networks have many near-global-optima at various 
places [ill (the rightmost peak in Figure [5]) and we cannot 
know where the algorithm locates its solution. 




Figure 5: A dendrogram result for the modularity maximization algo- 
rithm, with a plot of resulting modularity values given the partition. 



The optimization proposed by Clauset et al. [l8| is 
to store a matrix containing only the values of the com- 
munities, i.e. the modularity changes when joining the 
communities i and j. The algorithm can now be defined 
as follows. Calculate the initial values of AQij and keep 
track of the largest element of each row of the matrix AQ. 
Select the largest AQij among these largest elements, join 
the corresponding communities, update the matrix AQ 
and the collection of the largest elements and increment 
Q by AQij . Repeat this last step until the dendrogram is 
complete. In |88j the modularity maximization approach 
is adapted to the case of a directed network. We therefore 
have a matrix representation of the graph, but the matrix 
is not symmetric. The algorithm is based on (89| . 

More recent works point to also applying the modu- 
larity approach to overlapping communities [90|. A local 
evaluation of modularity has also been proposed, by di- 
viding the graph into known, boundary and unexplored 
sets. Two more implementations of modularity-based al- 
gorithms can be found in [oil ]. 

Another optimization of modularity-based approaches 
is presented in [92]. This is basically a divisive algorithm 
that optimizes the modularity Q using a heuristic search. 
This search is based on a measure (A) that depends on the 
node degree, and its normalization involves all the links 
in the network after summation. The node selected, in 
an original External Optimization algorithm [qsJ is always 
the node with the worst A^-value. There is a t-EO ver- 
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sion [9J] that is less sensitive to different initializations 
and allows escape from local maxima. A number of other 
optimization strategies have been proposed (size reduction 
95|, simulated annealing j96l]). 

Finally, we present the last greedy approach working 
with the classical definition of modularity [97] . The previ- 
ous largest graph used for modularity testing was 5.5 mil- 
lion nodes [98| , with this improvement it is possible to scale 
up to 100 million nodes. The algorithm is divided into two 
phases that are repeated iteratively. For each node i the 
authors consider the neighbors J of i and evaluate the gain 
in modularity that would take place by removing i from 
its community and by placing it in the community of J. 
The node i is then placed in the community for which this 
gain is maximum until no individual move can improve 
the modularity. The second phase consists in building a 
new network whose nodes are now the communities found 
during the first phase. It is then possible to reapply the 
first phase to the resulting weighted network and to iterate. 
This method has been tested on th e UK-Union WebGraph 
[99| . on co-citation networks (lOOl |. and on mobile phone 
networks. 

A particularly inter esting modularity framework is Mul- 
tislice modularity 'lOl^ . The authors extend the null model 
of modularity (the random graph) to the novel multiplex 
(i.e. multidimensional) setting. They use several gener- 
alizations, namely an additional parameter that controls 
coupling between dimensions, basing their operation on 
the equivalence between modularity-like quality functions 
and Laplacian dynamics of populations of random walk- 
ers [1£]. Basically they extend Lambiotte et al.'s work 
b y all owing multidimensional paths for the random walker 
(jl02j), considerin g th e different connection types with 
different weights ([103[), and a diffe rent spread of these 
weights among the dimensions ( |l04 |). 

In order to represent both snapshots and dimensions of 
the network, the authors use slicing. Each slice s of a net- 
work is represented by adjacency Aijs between nodes i and 
J. The authors also specify inter-slice couplings Cjrs that 
connect node j in slice r to itself in slice s. They notate the 
strengths of each node individually in each slice, so that 
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slice strength Kjs = I 
associated multislice null model. The resulting multislice 
extended definition of modularity is the following: 
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In this extension 7s is the resolution parameter, that may 
or may not be different for each slice. If 7^ = 1 for any s, 
then this formula degenerates on the usual interpretation 
of modularity as a count of the total weight of intra-slice 
edges minus the weight expected at random. Otherwise 
inter-slice coupling Cjsr is considered. Cjsr takes values 
from to 00. If Cjsr = we degenerate again in the usual 
modularity definition. Otherwise the quality-optimizing 
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Figure 6: A third-order tensor. 

partitions force the community assignment of a node to re- 
main the same across all slices in which that node appears. 
In addition the multislice quality is reduced to that of an 
adjacency matrix summed over the contributions from the 
individual slices with a null model that respects the degree 
distributions of the individual contributions. The general- 
ity of this framework also enables different weights to be 
included across the Cjsr couplings. After defining the new 
quality function, the algorithm needed to extract commu- 
nities can be one of many modularity-based algorithms. 

In Table 2 we merged all modularity approacches on 
the single "Modularity" row. One caveat is that, depend- 
ing on the implementation, not all the features may be 
returned (for example only Multislice implementation is 
able to consider multidimensionality) . 

5.2. MetaFac [MJ 

In this work, the authors introduce the concept of meta- 
graph. The metagraph is a relational hypergraph to repre- 
sent multi-relational and multi-dimensional social data. In 
practice, there are entities which connect to different kinds 
of objects in different ways (e.g. in a social media through 
tagging, commenting or publishing a photo, video or text). 
The aim is to discover a latent community structure in the 
metagraph, for example the common context of user ac- 
tions in social media networks. In other words the authors 
are interested in clusters of people who interact with each 
other in a coherent manner. In this model, a set of entities 
of the same type is called a facet. An interaction between 
two or more facets is called a relation. 

The idea of the authors is to use an M-way hyperedge 
to represent the interactions of M facets: each facet as a 
vertex and each relation as a hyperedge on a hypergraph. 
A metagraph defines a particular structure of interactions 
between facets (groups of entities of the same type), not 
between facet elements (the entities themselves). In or- 
der to do so, the metagraph is defined as a set of data 
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tensors. A tensor is an array with TV dimensions (see Fig- 
ure [5] for an intuitive representation of a three dimensional 
tensor) . This is a mathematical and computer science def- 
inition of tensor s, fo r the notion of tensor in physics and 
engineering see |105j . For an extensive review of tensors, 
tensor decomposition and their applications and tools see 
|l06l | (in this work some examples are also provided of 
possible app lications of tensor decomp ositio ns : signal pro- 
cessing [107'], numerical linear al gebra 108 1 and, closer to 
our ar ea o f inte rest, data mining lOOl 110 1, gra ph a nalysis 
tasks 111 , 112l | and recommendation systems |ll3l | ) . 

Given the metagraph and its defined data tensors, the 
authors apply a tensor decomposition and factorization op- 
eration, which is a very hard task with a number of known 
issues. To the best of our knowledge, only recently have 
some memor y and time efficient techniques been devel- 
oped, such as [ll4l | . In the metagraph approach the tensor 
decomposition can also be viewed as a dynamic analysis, 
when the sets of tensors are temporally annotated and 
the resulting core tensor refers to a specific time-step t. 
This is called metagraph factorization (for time evolving 
data). Finally, the MF problem can be stated in terms 
of optimization, i.e. minimizing a given cost function, 
thus obtaining facet communities (for a time complexity 
ofO{mnD)). 

5.3. Variational Bayes fs^j 

In this work, the authors model a complex network 
as a physical system, and then the problem of assigning 
each node to a module (inferring the hidden membership 
vector) in the network is tackled by solving the disorder- 
averaged partition function of a spin-glass. 

The authors define a joint probability by considering 
the number of edges present and absent within and among 
the K communities of a network. Traditional methods 



115l | need to specify K, this one is parameter free: the 



most probable number of modules (i.e. occupied spin states) 
is determined a.s K — argmaxKP(K\A). Such methods 
also need to infer posterior distributions over the model 
parameters (i.e. coupling constants and chemical poten- 
tials) p['K,9\A) and the latent module assignments (i.e. 
spin states) p{a\A). The computationally intensive solu- 
tion is tackled using the variational Bayes approach (ll6| . 

This is a special case of the more general Stochastic 
Block Model, which is a family of solutions that reduces 
the community discovery probl em to a s tatistical inference 
one. Historical approaches are 117l . lll8l |. while other algo- 
rithms with the same technique, but different community 
definitions, are presented in different sections of this paper. 

5.4. LA^ IS^* [M] 

In this work, the authors adopt the following definition 
of a community: a group C of actors in a social network 
forms a community if its communication density function 
achieves a local max imum in the collection of groups that 
are close to C [llOf . Basically, a group is a community 



if adding any new member to, or removing any current 
member from, the group decreases the average number of 
the communication exchanges. 

This work is an evolution of |l20| . It is built on two 
distinct phases: Link Aggregate (LA) and the real core 
of community detection (IS^). The authors need a two- 
step approach because the IS^ algorithm performs well 
at discovering communities given a good initial guess, for 
example when this guess is the output of another clustering 
algorithm, in this case called Link Aggregate (LA). 

In LA, the nodes are ordered accordin g to some cri- 
terion, for example decreasing Page Rank [121], and then 
processed sequentially according to this ordering. A node 
is added to a cluster if adding it improves the cluster den- 
sity. If the node is not added to any cluster, it creates a 
new cluster. The complexity of this stage is 0{mk + n). 

IS^ explicitly constructs a cluster that is a local max- 
imum w.r.t. a density metric by starting at a seed can- 
didate cluster and updating it by adding or deleting one 
node at a time as long as the metric strictly improves. The 
algorithm can be applied to the results of any other cluster- 
ing technique, thus making this approach useful a general 
framework to improve some incomplete, or approximate, 
results. 



5.5. Local Density fSJ, 

In this work, the authors apply the classical approach 
which characterizes this category, i.e. to define a den- 
sity quality measure to be optimized and then recursively 
merge clusters if this move produces an increase in the 
quality function. Here this function is the internal degree 
of a cluster C, i.e. the number of edges connecting vertices 
in C to each other, degmtiC) = \{{u,v) g E\u,v G C}\. 
Thus it is possible to define the local density of cluster as 



6iiC) = 



2degint{C) 
|C|(|C|-1) 



Optimizing 5 G [0, 1] alone makes small cliques superior 
to larger but slightly sparser sub-graphs, which is often 
impractical. For clusters to only have a few connections 
to the rest of the graph, one may optimize the relative 



density 



Sr{C) = 



degint{C) 



degint{C) + degextiC) ' 



where degext{C) = |{(w, w) G S|u G C, w £ V \ C}|. The 
final quality measure used is /(C) = 5i{C)5r{C). A good 
approximation of the optimal cluster for a given vertex 
can be ob taine d by a local search, guided with simulated 
annealing 122 |. 



6. Bridge Detection 

The meta definition of community for the algorithms 
in this section is: 
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Figure 7: An example of a graph that can be partitioned by identi- 
fying a "bridge" . 



Meta Definition 4 (Isolated Community). An isolated 
community in a complex network is a component of the 
network obtained by removing all the sparse bridges from 
the structure that connect the dense parts of the network. 

Usually, approaches in this category implement the fol- 
lowing meta procedure: 

Meta Procedure 3. Rank nodes and edges in the net- 
work according to a measure of their contribution to keep- 
ing the network connected and then remove these bridges 
or avoid expanding the community by including them. 

The bridge identified by the arrow in Figure [7] is a 
perfect example of an edge to be removed in order to de- 
compose the network into disconnected components which 
represent our communities. The main focus for these ap- 
proaches is how to find these bridges (which can be both 
nodes or edges) inside the network. The most popular ap- 
proach in this category is to use a centrality measure. No 
assumptions at all are made about the internal density of 
the identified clusters. 

In a social network analysis, a centrality measure is 
a metric defined in order to obtain a quantitative eval- 
uation of the structural power of an entity in a network 
123[. An entity does not have power in the abstract, it 
has power because it can dominate others. There are a 
number of measures defined to capture the power of an 
entity in a network. These include: Degree centrality, ac- 
tors who have more ties to other actors may have more 
favorable positions; Closeness centrality, the closer an en- 
tity is to other entity in the network, the more power it 
has; Betweenness centrality, the most important entity in 
the network is the entity present in the majority of the 
shortest paths between all other entities. 

Here we focus on two methods based on an edge defi- 
nition of the traditional node betweenness centrality: the 
very first edge betweenness community discovery algorithm 

which has recently been the focus of further evolutions. 



Figure 8: An intuitive example of the bridge detection approach. In 
this graph the edge width is proportional to the edge betweenness 
value. Wider edges are more likely to be a bridge between commu- 
nities. 



i.e. a general approach that uses split betweenness in or- 
der to obtain an overlapping community discovery frame- 
work ^35*]. We then also consider two alternative methods 
[36, , .37] which try to detect the bridges by expanding the 
community structure and computing a community fitness 
function. 

As can be seen in Table 2, these algorithms are good 
at finding overlapping partitions (it is not the original edge 
betweenness algorithm, however basically the CONGA strat- 
egy enables it to detect overlapping clusters). The weak 
points of this approach appear when dealing with dynamic, 
multidimensional or incremental structures. We are not 
able to prove this point in the experimental section so we 
will use an intuitive explanation. In order to compute 
the fitness function to detect bridges, it is necessary to 
start from the assumption that the algorithm is a com- 
plete representation of all connections among the clusters, 
which may be hard in an incremental setting. Further- 
more, for routing algorithms that are needed to compute 
the betweenness or closeness centrality, there are some con- 
straints on the structure of the network which are not sat- 
isfied in a multidimensional setting. Consider a network 
with two dimensions and a rule that states that jumping 
from one dimension to another, lowers the cost of the path. 
We thus have negative cycles and a significant shortest 
path cannot be computed (since in Bellman-Ford's algo- 
rithm, disallowing edge repetition, it is possible to obtain a 
shortest path that will always cross all the ne gativ e cycles 
it can, thus destroying the concept of bridge [l2J|). 

6.1. Edge Betweenness Q/ 

The main assumption of this work is that if a network 
contains communities or groups that are only loosely con- 
nected by a few inter-group edges, then all the shortest 
paths between different communities must go along one 
of these edges. In order to find these edges, which are 



15 



mostly between other pairs of vertices, the authors gener- 
aHze Freeman's betweenness centraUty |l25j to edges, and 
define the "edge betweenness" of an edge as the number of 
shortest paths between pairs of vertices that run along it. 
Figure |S] depicts an example, where the size of the edges 
is proportional to their edge betweenness. As can be seen, 
the higher edge betweenness values are taken by the edges 
between communities. By removing these edges, it is pos- 
sible to separate one group from one another and thus 
reveal the underlying community structure of the graph. 

This is one of the first community discovery algorithms 
developed after the renewed interest in social network anal- 
ysis that started in the late 1990s. Previously the tradi- 
tional graph partitioning approach constructed communi- 
ties by adding the strongest edges to a n ini tially empty 
vertex set (as in hierarchical clustering |l26l |l. Here, the 
authors construct communities by progressively removing 
edges from the original graph. 

While the classical implementation of the edge between- 
ness algorithm is 0(mn), a speed-up for parallel systems 
that are linear [106] has recently been proposed. Thus 
without the parallel algorithm the worst case time com- 
plexity is 0{m?n). There are slight variations of this 
method using different centrality measures ( 127 . 128| ). 



6.2. CONGA [35] 

CONGA (Cluster-Overlap Newman Girvan Algorithm) 
is based on the well-known edge betweenness community 
discovery algorithm described in Section I5T] It adds 
the ability to split vertices between communities, based on 
the new concept of "split betw eenness" . 

The split betweenness (l29l | of a vertex v is the number 
of shortest paths that would pass between the two parts 
of V if it was split. There are many ways to split a vertex 
into two, the best split is the one that maximizes the split 
betweenness. Basically, with the following split operation, 
any disjoint community discovery alg orith m can be applied 
and returns overlapping partitions f (l30| |): 

1. Calculate edge betweenness of edges and split be- 
tweenness of vertices. 

2. Remove edge with maximum edge betweenness or 
split vertex with maximum split betweenness, if greater. 

3. Recalculate edge betweenness and split betweenness. 

4. Repeat from step 2 until no edges remain. 

Given a relaxed assumption on the edge betweenness 
computation, the total time complexity of CONGA is 0{n log 

6.3. L-ShellfM] 

In L-Shell algorithm, the idea is to expand a commu- 
nity as much as it can, stopping the expansion whenever 
the network structure does not allow any further expan- 
sion, i.e. the bridges are reached. 

The key concept is the I — shell, a group of I vertices 
whose aim is to grow and occupy an entire community 
while two quantities are computed: the emerging degree 



and total emerging degree. The emerging degree of a ver- 
tex is defined as the number of edges that connect that 
vertex to vertices that the I — shell has not already vis- 
ited as it expanded from the previous {I — 1), (l — 2), ... 
—shells. The total emerging degree Kj of an ^ — shell is 
thus the sum of the emerging degrees of all vertices on the 
leading edge of the Ishell. 

For a starting vertex j the algorithm starts an I — shell, 
^ = 0, at vertex j (add j to the list of community members) 
and computes the total emerging degree of the shell. Then 
it spreads the I — shell, I — 1, it adds the neighbors of j to 
the list, and computes the new total emerging degree. Now 
it can compute the change in the emerging degree of the 
shell. If the total emerging degree is increased less than 
a given threshold a, then a community has been found. 
Otherwise it increases the size of the shell (posing Z = ^-1-1) 
until a is crossed or the entire connected component is 
added to the community list. As can be seen, for each node 
we have a quadratic problem, i.e. the time complexity is 
0{n^). The assumption is that a community is a structure 
in which the total emerging degree cannot be significantly 
increased, i.e. the vertices at the border of the community 
have few edges outside it and these edges are the bridges 
among different communities. 

6.4. Internal- External Degree fs^ l 

An approach close to I — shell starts from the simi- 
lar basic assumption that communities are essentially lo- 
cal structures, involving the nodes belonging to the mod- 
ules themselves plus at most an extended neighborhood of 
them. The fitness chosen here is the total internal degree 
of nodes on the sum of internal and external degrees to 
the power of a positive real- valued parameter (a). Given 
a fitness function, the fitness of a node A with respect to 
sub-graph Q, fa, is defined as the variation of the fitness 
of sub-graph Q with and without node A. The process 
of calculating the fitness of the nodes and them joining 
them together in a community stops when the nodes ex- 
amined in the neighborhood of Q all have negative fitness, 
i.e. their external edges are all bridges, after a total time 
complexity of 0{n^ logn). 

Large values of a yield very small communities, instead 
small v alues deliver large modules. For a=l this method 
recalls |l27| closely, which is another algorithm that falls 
into this category. Going from a=0.5 to a=2 reveals the 
hierarchical structure of the network. 

.). 

7. DiflFusion 

A diffusion is a process in which vertices or edges of 
a graph are randomly designated as either "occupied" or 
"unoccupied" the various properties of the resulting pat- 
terns of vertices are then queried [l3lj (see Figure IHl which 
also highlights the lack of clear bridges between commu- 
nities or any density difference between the inside and the 
outside of clusters) . A generalization of a diffusion process 
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Figure 10: Possible steps of a label propagation-based community 
discoverer. 



Figure 9: An example of graph partitioned with a diffusion process. 

can be used for community discovery in complex networks, 
according to the following definition of community: 

Meta Definition 5 (Difi"usion Community). A diffu- 
sion community in a complex network is a set of nodes that 
are grouped together by the propagation of the same prop- 
erty, action or information in the network. 

The definition of the meta procedure followed by algo- 
rithms in this category is thus: 

Meta Procedure 4. Perform a diffusion or percolation 
procedure on the network following a particular set of trans- 
mission rules and then group together any nodes that end 
up in the same state. 

According to this meta definition, a community can 
also be defined as a set of entities infiuenced by a fixed set 
of sources. This is important because algorithms, which 
are not explicitly developed as approaches for graph par- 
titioning, are also considered as a community discovery 
method. Basically, this definition of the problem overlaps 
with another well-known data mining problem: infiuence 
spread and flow max imization ,1], which is often used for 
viral marketing 132|. Preliminary ideas can be found in 
133[, even if only a novel centrality measure is defined. 



and the approach can be mapped in the Newman edge 
betweenness algorithm Anoth er ap proach that mixes 
physics and information theory is [l3J|. 

Other interesting works in viral marketing are, given a 
community partition, the analysis of the gro up character- 
istics in order to predict their evolution 135j . In addition, 
it is possible to predict if a single vertex will be attached to 
a group, or even classify some features (and the evolution 
of these features ) of a group. While it is not a community 
discovery work, (l35l | can be used as a framework after a 
community detection algorithm in order to obtain a tem- 
poral evolving description of the identified groups. 



To sum up, the classical community discovery diffusion- 
based algorithms presented here are: a label propagation 
technique (ssj . dynamic node coloring for temporal evolv- 
ing communities [s^, and edge resistor algorithms that 
consider the original graph as an electric circuit [40l |. 

The influence propagation approaches reviewed here 
are: an analytical description of a network representing an 
exchange of information 41 1; GuruMine Q, a framework 
whose aim is to analyze "tribes", DegreeDiscountIC 42|, 



a classical spread maximization algorithm, and a mixed 
membership stochastic blockmodel algorithm [4^, which 
uses Bayesian inferences in order to compute the final state 
of the influence vectors for each node in the network. 

In this category, it is natural to deal with directed com- 
munities, since the diffusion process, when dealing with in- 
formation spread, is naturally modeled following asymmet- 
ric relations. It is also intrinsically dynamic, thus many 
diffusion algorithms provide this feature in the community 
discovery solution. We found that no approach currently 
considers multidimensional networks, however we believe 
that considering different communication channels inside 
a network should be a key feature of this category. 

7.1. Label Propagation f3^ I 

Suppose that a node x has neighbors xi,X2, -..^Xk and 
that each neighbor carries a label denoting the community 
that it belongs to. Then x determines its community based 
on the labels of its neighbors. A three-step example of this 
principle is shown in Figure 1101 

The authors assume that each node in the network 
chooses to join the community to which the maximum 
number of its neighbors belong. As the labels propagate, 
densely connected groups of nodes quickly reach a con- 
sensus on a unique label. At the end of the propagation 
process, after a quasi-linear time complexity {0{m -\- n)) 
nodes with the same labels are grouped together as one 
community. 

Clearly, a node with an equal maximum number of 
neighbors in two or more communities can belong to both 
communities, thus identifying overlapping communities. It 
is ea sy to define an overlapping version of this algorithm 



136 |. 



7.2. Node coloring 139 

Consider an affiliation network in which some individ- 
uals form groups by attending the same event. In this 
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approach, which represents an evolution of |137| , the base 
input representation is an evolving bipartite graph of in- 
dividuals connected to events. 

Various rules have been defined to connect groups over 
time and form communities of groups: 

1. In each time step, every group is a representative of 
a distinct community; 

2. An individual is a member of exactly one community 
at any one time (but can change community affilia- 
tion over time); 

3. An individual tends not to change his / her commu- 
nity affiliation very frequently; 

4. If an individual keeps changing affiliations from one 
community to another, then it is not a true member 
of any of those communities; 

5. An individual is frequently present in the group rep- 
resenting the community with which he / she is af- 
filiated. 

The authors define the community interpretation of a 
graph G as a function f : V N. Each individual belongs 
to exactly one community in each time-step, and each 
group represents exactly one community. Thus, although 
the affiliation can change over time, this is a disjoint com- 
munity detection algorithm, not an overlapping one. To 
measure the quality of a community interpretation, the 
authors use costs (whenever an individual changes color, 
or it connects to groups with different colors, and so on) 
to penalize violations of Rules 3 and 5. The optimization 
problem is then to find the valid community interpreta- 
tion by minimizing the total cost resulting from the indi- 
vidual edges, group edges and color usage. The authors 
present an exhaustive global optimum algorithm with ex- 
ponential time complexity (the algorithm with dynamic 
programming tries all possible colorings of the graph) and 
then some heuri stics , ending up with a final complexity 
of 0(ntk^). In 'l38!| the authors present another set of 
heuristics and optimizations. 

7.3. KirchhofffM] 

In this paper, the basic idea is to imagine each edge 
as a resistor with the same resistance. It is then possible 
to connect a virtual "battery" between chosen vertices so 
that they have fixed voltages. Having made these assump- 
tions the graph can be viewed as an electric circuit with 
a current flowing through each edge (resistor). By solv- 
ing Kirchhoff's equations, the authors obtain the voltage 
value of each node. The authors claim that, from a node's 
voltage value they are able to judge whether it belongs to 
one community or another. This approach is very efficient, 
since the complexity is 0{m + n). 

A further expansion [139] applies a walk-based approach 
in order to unveil the hidden hierarchical structure of the 
network and identify good choices for the seed poles. The 
authors then apply a very similar implementation of this 
method using a KirchhofF matrix. 
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Figure 11: The GuruMine data structures: the action table and the 
influence graphs. 



7.4. Communication Dynamic i^a] 

The focus of the paper is an analytical description of 
the evolution of a network, whose size is stable over time 
and represents the exchange of communication among in- 
dividuals. The authors present a locality based model for 
communication dynamics, which can be used in order to 
identify the mechanisms of community creation and evo- 
lution over time in a social network. 

Similar approaches, such as preferential attachment |l4( 
are applicable only when communications are open (ob- 
servable to all nodes). Instead the authors present a local- 
ity based model which relies on two fundamental princi- 
ples: firstly the concept of locality reduces the set of nodes 
that a node can attach to in the next time step. Secondly, 
after obtaining a node's locality, the attachment mecha- 
nism, which is used by the individual to select the nodes 
in its locality to which it will connect at the next time step, 
must be specified. This is a Markov chain-like approach. 

For the preliminary community structure that identi- 
fies the local environment of a node, the authors use an 
existing method based on density [33| . The authors define 
the blogograph as a directed, unweighted graph represent- 
ing the communication of the blog network within a fixed 
time-period. There is a vertex in the blogograph repre- 
senting each blogger and a directed edge from the author 
of any comment to the owner of the blog. The authors 
consider consecutive weekly snapshots of the network. 

The authors recorded statistics as numbers of vertices, 
edges, the power-law degree distribution exponent, giant 
component size and so on, observing that they are sta- 
ble o ver t ime, consistent with previous observations (as in 



141| and |142l | ) . They also provide an indicator of commu- 
nity vitality over time. The goal is to produce a sequence 
of graphs which simulate the connection and reconnection 
of vertices and can be used for community validation. 

7.5. GuruMine 0/ 

The aim of GuruMine is to investigate how influence 
(for performing certain actions) propagates from users to 
their network friends, potentially recursively, thus identi- 
fying a group of users that behave homogeneously (i.e. a 
tribe, or a community). For instance. Table 3 shows a pos- 
sible action table with two actions, a and /3, and five users. 



Figures 11(b) and 11(c) represent the influence graphs of 
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these two actions. Ul can be considered as a tribe leader 
in both cases. However, for action a, Ul cannot be con- 
sidered a leader if the threshold regarding the minimum 
number of influenced users is equal to 4. 

Since the set of influenced users is the same, we have a 
"tribe leader" , meaning the user leads a fixed set of users 
(tribe) w.r.t. a set of actions, which can be considered a 
commu nity. The gene ral goal is similar to recent works 
such as [l43l Il44 . Il45| . However, here the input includes 
not just a graph (which is not edge-weighted) but also an 
action table which plays a central role in the definition of 
leaders. This action table contains a triple (u; t; a) indicat- 
ing that user u performed action a at time t, from which a 
directed propagation graph is derived. If the composition 
of the infiuenced graph is the same, we have a tribe. 

Any algorithm for extracting leaders must scan the ac- 
tion log table and traverse the graph (which means that the 
complexity also depends on this table and is 0{TAn'^)). 
The implementation works with only one scan, with the 
action log stored in chronological order. With this scan 
the influence matrix IMTr{U; A) can be computed. For 
tribe leaders the influence cube U sers x Actions x Users 
is needed, with cells containing Boolean entries if user v 
was influenced by user u w.r.t. action a. A tribe is essen- 
tially an item-set, i.e. a community with com mon behav- 
ior. This phase is implemented by ExAMiner |l46| . This 
work is part of a larger framework that also has a query 
interface ,1471. 

7.6. DegreeDiscountIC fii] 

This work is in the context of the classical data min- 
ing influence spread. The problem definition consists in 
deciding who to include in the initial set of targeted users 
so that, if necessary, they influence the largest number 
of people in the network. This knowledge can be used for 
community discovery: each seed node is the head of a com- 
munity that acts uniformly, and the set of these influenced 
nodes is the community mem bers. This work is an imple- 
mentation of the idea i n Il45ll and the improvement of the 
algorithm proposed in [MJ]. 

Influence is propagated in the classical network repre- 
sentation of social interactions according to a stochastic 
cascade model. Let S be the subset of vertices selected to 
initiate the influence propagation. In the cascade model 
(IC), let Ai be the set of vertices that are activated in the 
i-th round, and Aq = S. For each edge with one inactive 
endpoint, there is a probability of activation proportional 
to the active neighbors, and this is repeated until the cas- 
cade cannot expand any further. Then all edges not used 
for propagation are removed, and the set of influenced ver- 
tices is simply the set of vertices reachable from S' in G". 
This cascade can be evolved in a weighted model (WC), 
by considering the number of inactive neighbors of an ac- 
tive node and the activated neighbors of an inactive node. 
A discount on the degree of these vertices is considered 
if both connected nodes are part of the seed set. With 



this and more finely tuned heuristics on degrees, the au- 
thors manage to develop a well performing algorithm with 
a reasonable level of complexity (equal to 0(fclogn-|~m)). 

7.7. MM55 0y 

In the mixed membership stochastic blockmodel ap- 
proach (MMSB), the authors implement the following mech- 
anism: each node belongs to any possible community with 
a certain probability. These probabilities are then influ- 
enced by the probabilities of all other nodes. In practice, 
the influence of affiliations spreads over the network un- 
til convergence, by averaging the vector of probabilities of 
each node with the vector of the general infiuences. In 
other words, this process is equivalent to label propaga- 
tion, and instead of a simple number indicating the mem- 
bership there is a vector of probabilities. 

The indicator vectors are in the form of ~^p^q, which 
denotes the group membership of node p when it is ap- 
proached by node q (note that this is not symmetric). 
Then, for each node i a mixed membership vector Tit is 
drawn, and the value of the interaction between this vec- 
tor and the original one of the node is sampled. The au- 
thors also introduce a sparsity parameter to calibrate the 
importance of non-interaction. 

As for other mixed membership models, this is in- 
tractable to compute. A number of approximate inference 
algorithms for mixed membership models have re cently 
appeared such as mean- field variational methods [l48|, 
expectation pro paga tion |l49l | and Monte Carlo Markov 
chain sampling [150|. In these papers, the authors apply 
mean-field variational methods to approximate the pos- 
terior of interest, which has a complexity of 0{nk). An 
extension of this work which considers also the degree of 
the vertices as a normalization factor is 15lj . A work very 



related to this one, working with a very similar notion of 
prop agating probabilities as influence or information, is 



8. Closeness 

A very intuitive notion of community in a complex net- 
work is based on the concept of how close its members are 
connected together. A community is a set of individuals 
who can communicate with each other very easily because 
they can reach any other member in a relatively lower num- 
ber of hops than the network's average. Figure [12] shows 
a simple example of this configuration. The underlying 
definition of community in this case is: 

Meta Definition 6 (Small World Community). A smal 
world community in a complex network is a set of nodes 
that can reach any member of its group usually by crossing 
a very low number of edges, significantly lower than the 
average shortest path in the network. 



We use the term "small world" |153l | since it conveys 
the idea of very closely connected nodes. A very efficient 
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Figure 12: An example of a graph which can be partitioned by con- 
sidering the relative distance, in terms of number of edges, among 
its vertices. 



approach used with this problem definition rehes on ran- 
dom walks. A random walk is a process in which at each 
time step a walker is on a vertex and moves to a ver- 
tex chosen randomly and uniformly from its neighbors. 
The same procedure is followed for the new selected ver- 
tex. This is a Markov process. However, various strategies 
have been formulated in order to obtain very sophisticated 
random walk based application. For ex ample, the popular 
link analysis PageRank algorithm |l2lj is based on random 
walks. This ends up in the following meta procedure: 

Meta Procedure 5. Given a network, perform several 
random walks and then cluster together nodes which ap- 
pear frequently in the same walk. 

Algorithms in this category inherit the weakness in 
multidimensional networks from Bridge Detection algo- 
rithms, since also in this case paths are important in this 
community discovery category. 

To the best of our knowledge there are three main 
community discoverers that use random walks in order to 
find communities whose members are very close to each 
other: Walktrap ^4^, based on the assumption that when 
performing random walks the virtual surfer is trapped in 
the high density regions of the graph (i.e. the communi- 
ties); DOCS [45i], a more complex framework that also uses 
modularity as a fitness function; and Infomap (i^, which 
applies an information-theoretic approach. An older ap- 
proach in this category is the Markov Cluster Algorithm 
|154l |. which is still commonly used especially in bioinfor- 
matics. It simulates a controlled flow through random 
walks in a network using matrix multiplication and in- 
flation. 

8.1. Walktrap 0/ 

The Walktrap approach is based on the following in- 
tuition: random walks are able to unveil the real distance 
among nodes by frequently exploring nodes in the same 



community. The key problem is the definition of the dis- 
tance function between any two vertices, computed from 
the information given by random walks in the graph. High 
values of this measure mean that the two vertices i and j 
"see" the network in a very similar way, thus they belong 
to the same community. Therefore, this distance must be 
large if the two vertices are in different communities, and 
small otherwise. In the original paper this distance is de- 
fined as: 
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where Pf^ is the probability to go from i to j in t steps 
and d{k) is the degree of vertex k. 

A critical parameter is the length t of the random 
walks: it must be sufficiently long to gather enough in- 
formation regarding the topology of the graph. However 
it must not be too long because when the length of a ran- 
dom walk starting at vertex i tends towards infinity, the 
probability of being on a vertex j only depends on the 
degree of vertex j (and not on the starti ng vertex i). 



lEillli. How- 



Similar random walk approaches are 
ever they are less efficient compared to the average com- 
plexity of Walktrap, which is at the worst case 0{mn^). 

8.2. DOCSUi] 

This method is based on a spectral partiti on an d ran- 



dom walk expansion, and is an extension of 157| . The 
general idea is to obtain an initial guess in a first step 
regarding the community structure, and then collapse or 
expand these communities according to the hints given by 
the random walks among them. 

The first step is to coarsen the original graph into a 
series of higher level graphs. This is guided by modular- 
ity maximization. In the lazy random walk stage, vertices 
are labeled as contributing or non contributing vertices 
depending on whether or not they can be moved to an- 
other cluster and provide an increase in modularity. They 
are also sorted in a descending order by their contributing 
values. The target communities can then be extracted. 

8.3. Infomap 0/ 

The Infomap algorithm is one of the most accurate 
community discovery methods [1_5B!| • It is based on a com- 
bination of information-theoretic techniques and random 
walks. The authors explore the graph structure with a 
number of random walks of a given length and with a 
given probability of jumping to a random node. This ap- 
proach is e quiv alent to the random surfer of the PageRank 
algorithm 121 1. 

Intuitively, the random walkers are trapped in a com- 
munity and exit from it very rarely. Each walk is described 
as a sequence of steps inside a community followed by a 
jump. By using unique names for communities and reusing 
a short code for nodes inside the community, this descrip- 
tion can be highly compressed, in the same way as re-using 
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street names (nodes) inside different cities (communities). 
The renaming is done by assigning a Huffman coding to 
the nodes of the network. The best network partition wiU 
result in the shortest description for all the walks. 

9. Structure Definition 

A number of works tackle community discovery with a 
very strong assumption: to be called a community, a group 
of vertices must follow a very strict structural property. 
In other words, they use the following meta definition of 
community: 

Meta Definition 7 (Structure Community). A struc- 
ture community in a complex network is a set of nodes with 
a precise number of edges between them, distributed in a 
very precise topology defined by a number of rules. Sets 
of nodes that do not satisfy these structural rules are not 
communities. 

The aim of the community discovery algorithm is to 
find all the maximal structures in the network that satisfy 
the desired constraints. The corresponding meta proce- 
dure implemented in this category is simple (i.e. find in a 
efficient way all the maximal structure defined) and hence 
there is no need to discuss it further. 

This task is similar to a very well-known data mining 
problem in network analysis: graph min ing. Some ex - 
amples of graph mining algorithms are 
However, traditional graph mining algorithms only return 
all the single different structure patterns with their sup- 
port. In community discovery there is only one important 
structure and the desired result is the list of all vertex 
groups that make up that structure in the network. 

We will thus ignore pure graph mining algorithms and 
just focus on structural community discovery approaches. 
The methods reviewed here are: clique percolation j3j and 
its evolution for bipartite graphs [48j, the s-plexes detec- 

We will 



tion 47[ and a maximal clique approach |4E 
not focus o n oth er minor evolutions, such as the k-dense 
approaches [162 1. 

Since a defined structure may be, without any con- 
straint, overlapping, weighted, directed or multidimensional, 
there is virtually no structural feature that cannot be em- 
bedded in a definition used by the algorithms in this cat- 
egory. Depending on the desired structure, analysts can 
also find communities that do not overlap with any of the 
previous categories, thus avoiding densities, or bridges or 
any other previous definition. The downside of this strat- 
egy arises when working in an incremental setting: given 
a simple modification on the structure, such as adding or 
deleting a single node or edge, the algorithm is likely to 
recompute everything from scratch. This is because prop- 
erties of the substructure that are discovered may be vio- 
lated by any single modification. 




Figure 13: The overlapping community structure detected by a 
clique-percolation approach. 



9.1. K- Cliques fi] 

Palla et al. suggest that a community can be inter- 
preted as a union of smaller complete (fully connected) 
sub- graphs that share nodes. The authors define a k- 
clique-community as the union of all k-cliques that can 
be reached from each other through a series of adjacent 
k-cliques. Two k-cliques are said to be adjacent if they 
share fc — 1 nodes. A 2-clique is simply an edge and a 2- 
clique-community is the union of those edges that can be 
reached from each other through a series of shared nodes. 
Consider Figure [131 In this case the clique percolation 
approach detects {0,1,2,3} as a 4-clique. Then it con- 
siders {1,2,3,4}: it is again a 4-clique and it shares 3 
vertices with the previous one. Thus the two cliques are 
joined in one community. The same is true for the 4-cliques 
{2, 3, 4, 6} and {2, 4, 5, 6}, thus identifying the community 
{0,1,2,3,4,5,6}. In this process, two communities can 
have an overlap of some vertices (in the example, vertices 
5 and 9). 

The algorithm first extracts all complete sub-graphs of 
the network that are not part of a larger complete sub- 
graph. The aim of the first phase is to populate a clique- 
clique overlap matrix. In this data structure each row 
(and column) represents a clique and the matrix elements 
are equal to the number of common nodes between the 
corresponding two cliques. The diagonal entries are equal 
to the size of the clique. The k-clique-communities can 
be found by erasing every off-diagonal entry smaller than 
fc — 1. The complexity of this procedure, since the hardness 
of clique detection, is 0{m~T^). 

9.2. S-Plexes Enumeration [17] 

A n s-p lex is a relaxed concept of the c-isolated clique 

lH, [l6J. Let G = iV,E) be an undirected graph. A 
set S C V oi k vertices is called c-isolated if it has less 



than cfc outgoing edges, where an outgoing edge is an edge 
between a vertex in S and a vertex in V\S. A c-isolated 
clique is a concept that is considered too restrictive for 
a community. Instead, the authors use a relaxed version 
of a c-isolated clique called s-plex 165]: in an undirected 
graph G = {V,E), a vertex subset 5 C 1/ of size fc is called 
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an s-plex if the minimum degree in G[S] is at least k — s. 
Hence, cliques are exactly 1-plexes. 

Since in an s-plex S of size k every vertex v € S is 
adjacent to at least k — s vertices, the sub-graph induced 
by S in the complement graph (the graph with the same 
set of vertices and complementary edge set) G[S] is a graph 
with a maximum degree of at most s — 1. The idea is to 
enumerate maximal s-plexes in G by deleting minimal sub- 
graphs with a maximal degree of s — 1 in the complement 
graph. A key concept for this solution is the pivot set P. 
The pivot set contains the pivot vertex v and those vertices 
that belong to the s-plex but are not adjacent to v. The 
pivot vertex is defined as the vertex with the lowest index 
of those vertices with less than c outgoing edges. 

The algorithm is an evolution of [166] and removes ver- 
tices from the candidate set C with too few neighbors in 
C. It builds the complement graph, then for each possi- 
ble pivot set P applies the deletion of minimal sub-graph 
in the complement graph. Finally, it removes enumerated 
s-plexes that either have pivot u ^ v or are not maximal. 
The complexity is 0(knm). 

9.3. Bi- Clique 0/ 

This is a bipartite graph version that solves various 
issues regarding the k-clique approach [sj, namely the im- 
possibility to analyze sparse network regions, due to the 
fact that 2-clique communities are simply the connected 
components of the network. The first non-trivial k-clique 
has size k = 3 and nodes must have at least two links in 
order to qualify for participation in a 3-clique. In networks 
with heavy tailed degree distributions, a large fraction of 
the nodes have less than two edges. 

Bi-clique is a natural approach for affiliation networks, 
where in a one-mode projection all (sparse) information re- 
garding the bipartite linkages is reduced to a giant quasi- 
clique. All the information contained in edge weights is 
typically discarded in a subsequent thresholding opera- 
tion. The Bi-Clique algorithm detects structures between 
2-clique communities and 3-clique communities where the 
k-clique algorithm usually fails. 

The algorithm begins by isolating the N maximal bi- 
cliques in the bipartite network using jl67j . Using this 
list the authors create two symmetric clique overlap ma- 
trixes for the two classes of nodes. Then, for both matrix 
diagonal, elements greater than or equal to a and h (the 
two parameters of the algorithm) respectively are set to 
one, while everything else is set to zero. The final overlap- 
ping matrix is obtained by the matrix intersection, using 
the AND operator. The final step is to determine the con- 
nected components of L; each component corresponds to a 
bi-clique community. The final complexity of the approach 
is 0[m^). 

9.4. EAGLE ^] 

EAGLE starts from the following assumption: in every 
dense-linked community there is at least one large clique. 



This clique could be considered the core of the community. 
EAGLE firstly finds out all the maximal cliqu es in the net- 
work with the Bron-Kerbosch algorithm |168l | (complexity 
0(3t)), discarding those whose vertices are part of other 
larger maximal cliques and those with less than k vertices. 
EAGLE then calculates the similarity between each pair of 
communities. It then selects the pair of communities with 
the maximum similarity, incorporating them into a new 
community and calculating the similarity between the new 
community and other communities. The similarity mea- 
sure is the modularity [l3- This calculation is repeated 
until only one community remains, thus completing the 
dendrogram. 

The second stage is to cut the dendrogram. Any cut 
through the dendrogram produces a cover of the network. 
To determine the place of the cut, a measurement is re- 
quired to judge the quality of a cover, computed with a 
given variant of modularity. 

10. Link Clustering 

Some recent approaches have been based on the idea 
that the community is not a partition of network nodes, 
but a partition of the links. In other words, it is the rela- 
tionship between two entities that belongs to a particular 
environment and the entities belong to all the communities 
of their edges (or a subset of them). 

The meta procedure in this class is: 

Meta Procedure 6. We are given a set of relations M 
between a set of entities N. We cluster together relations 
that are similar, i.e. established between the same set of 
entities, and we then connect each entity n to the commu- 
nities its relations belong to. 

The underlying meta definition of community is: 

Meta Definition 8 (Link Community). A link commu- 
nity in a complex network is a set of nodes that share a 
number of relations clustered together since they belong to 
a particular relational environment. 

This approach implies an overlapping partition, since 
a node belongs to all the communities of its links, and 
only in rare occasions do all the links belong to a single 
community. We prove this point in Section [T51 by looking 
at the average number of communities a node belongs to, 
according to algorithms in this category. One feature that 
is ignored by this community definition is the direction 
of a relation, since an undirected link belongs to a single 
community. There is no way to attach a relationship from 
u to f to a community and a relationship from t; to u to 
another community, since they both belong to the same 
relational environment. 

The basic approach to the link clustering problem is 
to define a projection graph in which the nodes represent 
the links of the original graph and the definition of a prox- 
imity value in order to understand how close two edges of 
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the network are. In both cases the critical point is to mea- 
sure the relations between the edges. A classical clustering 
algorithm can then be applied. 

The methods reviewed here reflect both approaches. 
The first [H^ defines the projection graph with a random 
walk measure for the proximity of the projected edges, 
then uses modularity to compute the modules of the net- 
work. The second one 5l| is a general framework in which 
it is possible to define any distance measure for the nodes 
(such as the Jaccard index) and then apply a classical hi- 
erarchical clustering technique based on this distance defi- 
nition. Finally we present also a bayesian approach to this 
problem [52 1. 



10.1. Link modularity fsdl l 

In this work, by defining communities as a partition of 
the links rather than the set of nodes, the authors inter- 
pret the usual modularity Q in terms of a random walker 
moving on the nodes. They further define two walking 
strategies: a link-link and a link-node-link random walk. 
They project the adjacency matrix onto a bipartite inci- 
dence matrix. The elements Bia of this n x m matrix are 
equal to 1 if link a is related to node i, and otherwise. 

The incidence matrix is then projected onto a line graph: 
a link is added between two nodes in this projected graph 
if these two nodes have at least one node of the other 
type in common in the original incidence bipartite graph. 
Modularity is then computed on this line graph. The to- 
tal complexity of creating the line graph and computing 
modularity is 0{2mk\ogn). 

10.2. Hierarchical Link Clustering HLC* W^ l 

In this approach, the authors start from the assump- 
tion that whereas nodes belong to multiple groups (in- 
dividuals have families, co-workers and friends), links of- 
ten exist for one dominant reason (two people are in the 
same family, work together or have common interests) and 
therefore they cluster them. They define a link similarity 
measure as the Jaccard coefficient. This measure is com- 
puted on the sets of neighbors of each edge sharing one 
node (i.e. only adjacent edges). The formula used is: 
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where Cik is an edge between nodes i and k and n+(i) 
is the set of neighbors of node i. The approach can be 
used with an arbitrary similarity function for the edges. 
Furthermore, although weights and multipartite structures 
are not considered with this formula, the authors claim 
that it is possible to extend the approach in order to obtain 
such features. 

The authors then build a dendrogram with a classical 
hierarchical clustering approach using the defined similar- 
ity measure, with a time complexity of 0{nK^). In the 
dendrogram each leaf is a link from the original network 



and branches represent link communities. In the hierar- 
chical structure identified, links occupy unique commu- 
nities whereas nodes naturally occupy multiple commu- 
nities, owing to their links. Thus the extracted network 
structure is both hierarchical and overlapping. The den- 
drogram is then c ut by optimizing the partition density 
objective function [169|. 

11. Link Maximum Likelihood [5 

In this work the general idea of a link clustering is com- 
bined with multidimensional networks: the idea is that 
communities arise when there are different types of edges, 
i.e. dimensions, in a network. Basically the approach is 
to generate a model for the observed network with a given 
partition of edges into link communities and then testing 
these communities with a maximum likelihood approach. 
The generation and test is very similar to the technique im- 
plemented in the Expectation Maximization 0] presented 
in the following section, but in this case is applied on edges 
instead of applying it on nodes. 

12. No Definition 

There are a number of frameworks for community dis- 
covery that use a very trivial definition of community or 
have no definition at all. These methods often assume that 
there are some desirable features of the community that 
are not provided by many algorithms. They define prepro- 
cessing and/or postprocessing operations and then apply 
them to a number of other different known methods which 
do not extract communities with the desired features. In 
this way they improve the results. 

Basically, the meta definition adopted is: 

Meta Definition 9 (Community). Communities in a 
complex network are sets which present a number of par- 
ticular features regardless of why their nodes are grouped 
together. 

Of course, the meta procedures and features of these 
approaches depend on both the pre/postprocess and the 
"hosted" method. The works which present a proper def- 
inition of a community are, for instance, the evolutionary 
clustering [2H or the CONGA algorithm [sH'l, which have 
already been outlined in this survey. Given that we have 
presented their desired common features for the sets in the 
form of an independent community definition, we have not 
included these methods in this category. 

Instead we focus on four methods: the first is a hy- 
brid framework combining Bayesian and non-Bayesian ap- 
proaches [53], the second relies on a custom definition 
of community given by the analyst and then performs a 
multidimensional community discovery, by identifying the 
noisy relations inside the network 54], the third one is a 
bayesian hierarchical approach [53], finally the last one is 
based on an expectation maximization principle Q- 
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12.1. Hybrid* IM] 

For this framework, the authors start from the point 
that overlapping communities are a more precise descrip- 
tion for the multiplicity of node links compared to non- 
overlapping approaches. If a node's links cannot be ex- 
plained by a single membership, then the community dis- 
covery problem has to be solved in an overlapping formula- 
tion. On the other hand, if a node's links can be explained 
almost equally well by a number of single and mixed mem- 
berships, hard clustering may be simpler. The conclusion 
is that a combination of an overlapping community dis- 
coverer that takes an already hard defined community as 
input with a non overlapping method should perform bet- 
ter. Thus the HFCD framework is built. It is made up of 
three parts: the Bayesian core, the hint source procedure 
and the coalescing strategies. 

The Bayesian core is the overlapping community dis- 
covery algorithm that collects the hints from the other 
non overlappiiig method and outputs the final community 
partition. In [53| t he authors use a Latent Dirichlet Al- 
location on Graphs as their core method. The 
Bayesian core needs some hints in order to perform the 
community discovery procedure. These hints are provided 
by any other non overlap ping community detection algo- 
rithm, namely modularity [l8| and Cross Associations [58| 
(here reviewed in its evolution as a Context-specific Clus- 
ter Tree ^). 

The most important contribution of this approach is 
in creating a procedure that solves the problem of how to 
incorporate the hints into the core model. This is done 
by the coalescing strategies. The authors propose three 
different strategies: attributes (each community is an at- 
tribute of the node), seeds (the community partition is 
used as an initial configuration of the second community 
discovery phase), and prior (a mix of the previous two). In 
order to make the inference procedure both for attributes 
and for the initial co nfigu ration, the authors use the Gibbs 
sampling technique [172|. The additional complexity over 
the used methods is 0{nkK). 



12.2. Multi-relational Regression fS^ 

This algorithm aims to discover hidden multidimen- 
sional communities. The authors use the term "relation" 
for a dimension, i.e. a criterion to connect entities. They 
define relation networks, group them together and create 
a kind of social network, calling it a multi-relational social 
network or heterogeneous social network, another name 
for a multidimensional or multiplex network. The basic 
assumption is that each relation (explicit or implicit) plays 
a different role in different tasks. 

For instance consider the multidimensional network in 
Figure The authors suppose that an analyst might 
want to specify that nodes 8, 9, 10 and 11 belong to the 
same community. The three dimensions (represented by 
solid, dashed and thick edges) then have a different im- 
portance in reflecting the user information needed. The 




Figure 14: A multidimensional network. Solid, dashed and tick lines 
represent edges in three different dimensions. 



thick dimension can be considered as noise, and the most 
important dimension is obviously the dashed dimension. 
The community discovery process should take this situa- 
tion into account in order to provide an output close to 
the information needs of the user. 

The authors thus represent each relation with a weighted 
matrix. Each element in the matrix reflects the relation 
strength between the two corresponding entities. This ma- 
trix is then mined depending on a user example (or infor- 
mation need): the user submits a query defining the de- 
sired community structure. From this structure, the algo- 
rithm reconstructs the possible hidden relation, combining 
the single relation graphs with linear techniques, and then 
performs the community discovery on the resulting hidden 
graph. 

The hidden relation is tackled as a prediction problem: 
once the combination coefficients of the desired entities 
and the desired relations are computed, the hidden relation 
strength between any object pair can be predicted. This 
is a regression prob lem that can be solved with a number 
of techniques [173|. For a discussion of the issues in this 



solution based on unconstrained linear regression see jl74| . 
The exact regression used is the Ridge Regression. 



12.3. Hierarchical Bayes \5a l 

In this work authors start from the assumption that 
many real world networks present an hidden hierarchical 
organization able to explain some of the basic properties of 
the structure. By reconstructing this latent organization, 
they are able to group together nodes which are part of 
the same functional module of the network. It is evident 
that there is no traditional definition of community at all, 
and also the authors acknowledge that to reconstruct the 
hidden dendrogram is a task which goes beyond the simple 
clustering. 

Basically, authors generate and sample a set of dendro- 
grams, which are able to generate a random network with 
similar features to the observed network, with a Monte 
Carlo algorithm. The sampling is driven by the maxi- 
mum likelihood, i.e. the dendrograms are extracted ac- 
cording to how well they can reproduce the observed fea- 
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tures. By varying the Pr parameter, the probabihty to 
join two vertices in the dendrogram, authors can tune the 
dendrogram generation in order to fit different properties 
of the network. Finally, the set of dendrograms is merged 
into a single consensus dendrogram, which is the best over- 
all representation of the observed network. Although their 
technique presents an exponential time complexity at the 
worst case, authors found that in average their complexity 
should not exceed 0{n'^). 

12.4- Expectation Maximization 

This work acknowledge the standard assumption in the 
community discovery literature, i.e. to define what a com- 
munity is and then to implement an algorithmic procedure 
able to create a partition of the network which reflect the 
best community division according to the starting defini- 
tion. However, the problem is that sometimes it is hard 
to define a priori what a community is in a particular net- 
work, and failing to do so may end up in finding not signifi- 
cant results. The proposed method is instead able to adapt 
its definition of community to the most likely present in 
the data, which may be anyone of the presented classifica- 
tion in this paper. 

Basically the authors consider the group membership 
of each node as an unknown feature. They then define for 
each vertex i the probability that a (directed) link from 
a particular vertex in group r connects to vertex i as rjri- 
Finally, tt^ is the probability of belonging to group r. Both 
rjri and tt^. are unknown and depend on each other. With 
an iterative, self-consistent approach that evaluates both 
simultaneously, two characteristic equations which define 
the expectation maximization algorithm are derived, and 
the problem can be then solved. 

13. Experiments 

In this section we briefly present an experimental eval- 
uation of some of the presented algorithms. The aim is 
to strengthen the intuition regarding the desired features 
which each category is either able to present naturally or 
has difficulties with. 

In order to do this, as our benchmark we used a net- 
work extracted from the ego network of one of the author's 
Facebook profiles. We depicted the graph used in Figure 
[T51 The network contains 261 nodes and 1,722 edges. We 
chose this network because the human eye can easily spot 
natural denser areas: there are four main ones at the bot- 
tom and left hand side of the picture and three big areas 
in the upper right hand side, while in the middle there is a 
sort of gray area and smaller cliques and quasi-cliques of 3- 
7 nodes float around. We also have a thorough knowledge 
regarding the nodes and the actual community partition of 
these people from the perspective of the network ego, how- 
ever for privacy reasons we cannot include more detailed 
data. 




Figure 15: Our benchmark network. 



We have tried to include as many algorithms as pos- 
sible in this sectior0. We excluded reviewed methods for 
any of the following reasons: we were not able to find any 
implementation (or working implementation) freely avail- 
able, the algorithm did not provide better knowledge re- 
garding its category being very similar to another already 
included, or the algorithm was not suitable for real-world 
purposes, i.e. it was not able to provide a result on our ex- 
ample network in less than two hours and 1GB of memory 
occupation (for a 37kB input). 

All of the evaluation measures used take a partition P 
of the network as input, i.e. a list of set of nodes which 
may or may not have common elements (i.e. overlap). 

• Modularity (Q). Although there are overlapping 
definitions for this measure (90| . the main version 
used is the standard one which is not defined for over- 
lapping partitions. Therefore, we computed the orig- 
inal version of Modularity only for non-overlapping 
results. 

• Flake-ODF (//), introduced in |l75j . is defined as 
the fraction of nodes in a community that have fewer 
edges pointing inside than outside of the cluster. 
We calculate the average over all communities, i.e. 
flip) = J2keP l^"-"g'^-l»"'")-^^g'=>l<'^"3^")/'>l . In [nl 
many evaluation measures are presented in order to 
solve the monotonic increase in modularity (i.e. the 
resolution problem: bigger clusters tend to score bet- 
ter). However, we tested all of them in our exper- 
imental setting (some are not reported here for the 
sake of readability) and we found that all tend to as- 
sign constantly lower scores to overlapping partitions 
in the same network. Thus, these measures should 
be refined in order to be more general and to include 
the very common and popular overlap feature. 



^Wc implicitly thank all the authors of the included algorithms 
for making them available or sending them to us. 
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Algorithm 


fc 


n 


Q 


/' 




o 


SocDim 


12 


45.583 


N/A 


6.583 


0.451 


2.096 


Autopart 


6 


43.500 


0.309 


18.500 


0.212 


1 


Modularity 


8 


32.625 


0.724 


0.375 


0.744 


1 


Local Density 


31 


8.419 


0.714 


0.226 


0.549 


1 


Edge Betwccnncss 


11 


23.727 


0.738 


0.455 


0.656 


1 


CONGA 


119 


5.277 


N/A 


3.958 


0.076 


2.406 


Label Propagation 


13 


20.077 


0.735 


0.385 


0.616 


1 


Walktrap 


12 


21.750 


0.738 


0.250 


0.652 


1 


Infomap 


17 


15.353 


0.721 


0.765 


0.510 


1 


K-Clique 


16 


16.125 


N/A 


1.562 


0.341 


0.989 


S-Plex 


96 


3.615 


N/A 


2.417 


0.070 


1.330 


Link Modularity 


37 


26.216 


N/A 


3.730 


0.395 


3.716 


HLC 


256 


3.734 


N/A 


2.539 


0.063 


3.663 



Table 3: The statistical parameters of communities extracted with different approaches. 



• Reverse Conductance (C~^). Conductance is also 
presented in [175] as the fraction of total edge volume 
that points outside the cluster. We are interested in 
the reverse concept, i.e. the fraction of total edge 
volume that points inside the cluster, i.e. — 
JF] Efcep 2cr+m, ' ^here nik = \{{u,v) e m : u & 
k A V ^ k}\ and Cfe = u) ^ m : u G k A v ^ k}\. 

• Overlap Ratio (o) is informally defined as the av- 
erage number of communities that a node belongs to 
in the network, i.e. o{p) — J2neN ^^^^jjvi ^^"^^ • While 
a non overlapping community discovery usually re- 
turns 1 in this metric, if an algorithm does not cluster 
all the vertices in the network then it may return a 
value less than 1. 

We report the final results in Table 4, in which we have one 
row per algorithm and one column per measure. We added 
some statistically simple parameters such as the number of 
communities and average number of nodes per community. 
For the measures, in Table 4 we use the same notation used 
in this section to present them. 

We are now able to provide an additional reason for 
our classification by analyzing the presented results. 

SocDim and Autopart belong to the Feature Distance 
category. As discussed in Section HI in this category we 
have a method with basically any feature (for example, 
SocDim is multidimensional and overlapping, while Au- 
topart is parameter free and allows directed edges). The 
downside is the counter intuitive partition according to 
the graph topology. It is easy to see, in fact, how poorly 
Autopart scores in the Modularity test {Q). However, 
since we did not compute Modularity for the overlapping 
SocDim partition, we also used the Flake-ODF measure 
(//). In this case too, both SocDim and Autopart got 
higher values, i.e. it is more frequent that a node has 
more edges pointing outside the cluster than pointing in. 
Overlap partitions usually have the lowest performance ac- 
cording to Flake-ODF, and to Conductance, since nodes 
in the overlap zone are densely connected to two or more 
clusters. However Autopart is not an overlapping method 
and SocDim turned out to be the worst of the other over- 
lapping algorithms according to this evaluation. 



For the Internal Density category (Section [S|) we tested 
Modularity and Local Density algorithms. Their edge vol- 
ume inside the community (Reverse Conductance C~^) is 
high. For Modularity edge volume was the highest score, 
while Local Density scored well, although it did not come 
second for implementation reasons (the algorithm returns 
some communities with only one vertex which obviously 
contributes with zero to the sum). 

As stated in Section [S] regarding the bridge detection 
community discovery, no assumptions about the density of 
the clusters are made. Thus these algorithms may have a 
high score on the inverse conductance (Edge Betweenness) , 
or may not (CONGA). 

Unfortunately our set of algorithms for the Diffusion 
(Section [7]) category is very narrow and no conclusions 
can be drawn. Instead, Closeness algorithms (Section [8]) 
Walktrap and Infomap highlight their independence from 
a simple density definition: Walktrap favors a few big- 
ger (and denser) communities, while Infomap focuses on 
smaller and lower level sparser ones. 

There is one clear downside to the Structure definition 
category (Section [S]): the K-Clique algorithm has an over- 
lap ratio o less than one, since its structure definition is 
very strict and many nodes cannot satisfy it, ending up in 
no community. 

Finally, algorithms in the Link Community category 
fSectionlTOl) gave a very high overlap score (o) . This proves 
that clustering edges is a natural and automatic way to get 
highly overlapping partitions. 

14. Related Works 

Over the last decade, several reviews of community dis- 
covery methods have been publishe d. We wo uld consider 
the most important to be 176, 177, Uli [iTg,, 180, 181]. 

Fortunato and Castellano il79j, hugely extended by 
Fortunato in i6] , have published the most recent and prob- 
ably the most comprehensive review on the community 
discovery problem. To tackle the problem they consider 
various definitions of community (local, global and ver- 
tex similarity), features of communities for extraction, and 
different categories. The number of algorithms and refer- 
ences they considered is impressive. We believe that a 
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new review of this topic is needed because the authors an- 
alyze the main techniques of each method for community 
detection, however they do not build an organization of 
community definitions (while acknowledging that different 
ones exist). Thus, they do not consider the main contri- 
bution of our review: the creation of a classification of 
community based on definitions of state of the art algo- 
rithms. Without focusing on a classification of community 
definitions, Fortunato and Castellano's review cannot be 
used by a researcher with his/her own definition of what 
a community is in order to find the most relevant set of 
methods for his/her problem. Their review is aimed at 
people interested in building a new community detection 
algorithm, not people who want to use the methods in 
the literature. Furthermore, their work does not include 
some more advanced features and definitions of commu- 
nity found in the literature, such as multidimensionality 
or an influence s prea d formulation of the problem. 

Porter et al. |18C| and Schaeffer [ISli] hav e als o recently 



reviewed community discovery methods. In 18l| they also 
introduced the problem of a comprehensive meta definition 
of community in a graph. Again, however, although they 
begin to provide different definitions of community, they 
do not create a classification of the community discovery 
algorithm based on such a comm unity . 

In Newman's pioneering work [176l | he organizes histor- 
ical approaches to community discovery in complex net- 
works following their traditional fields of application. He 
presents the most important classical approaches in com- 
puter science and sociology, enumerating algorithms such 
as spectral bisection [17] or hierarchical clustering 182j. 
He then reviews new physical approaches to the commu- 
nity discovery problem, including the known edge between- 
ness and modularity (ssj ]. His paper is very useful for 
a historical perspective, however it records few works and 
obviously does not taken into account all the algorithms 
and categories of methods that have been developed since 
it was published. 

Chakrabarti and Faloutsos [l77| give a complete survey 
of many aspects of graph mining. One important chapter 
discusses community detection concepts, techniques and 
tools. The authors introduce the basic concepts of the 
classical notion of community structure based on edge den- 
sity, along with other key concepts such as transitivity, 
edge betweenness and resilience. However, this survey is 
not explicitly devoted to the community discovery prob- 
lem. It describes existing methods but does not investigate 
the possibility of different definitions of community or of 
a more complex analy sis. 

Danon et al. |178l | test an impressive number of differ- 
ent community discovery algorithms. They compare the 
time complexity and performance of the methods consid- 
ered. Furthermore, they define a heuristic to evaluate the 
results of each algorithm and also compare their perfor- 
mance. However, they focus more on a practical compari- 
son of the methods, rather than a true classification, both 
in terms of a community definition and in the feature con- 



sidered for the input network. 

Various authors have also proposed a benchmark graph, 
which would be useful to test community discovery algo- 
rithms [183r] . 

15. Conclusions 

The aim of this survey was to create a manual for 
the community discovery problem, to answer the question: 
"Given what is considered a community for analysts, which 
community detection algorithm should they use?" . This is 
a sort of orthogonal point of view compared to the classi- 
cal approach of community discovery reviews, traditionally 
aimed at analysts already within the community discovery 
field. 

We first tackled the problem of the lack of a universally 
accepted definition of what is a community. As pointed out 
by Fortunato Q, this lack of a theoretical framework has 
some important consequences not only in the community 
detection task itself (if we do not agree on the meaning of 
"community" how can we extract a community from the 
network?) but also in other aspects. One of these aspects 
is, for instance, the evaluation of an algorithm w.r.t. the 
results from another approach using a different definition 
of community. 

We have proposed a meta definition of community, and 
on this basis we built a new classification of community 
discovery methods based on the relationships of each def- 
inition of community using the general meta definition. 
We have reviewed the approaches according to general 
categories such as internal density, community structure 
definition and so on. This classification is a proposed an- 
swer to the problems highlighted by Fortunato. Each main 
method is then briefly presented, along with its relation- 
ship with other algorithms, its complexity and the strong 
and weak points of the category it belongs to. 

A crucial problem that we have identified is the need for 
an extensive study of the overlap between the definitions of 
community. As pointed out in Section [3.11 there are several 
complex connections between different definitions and dif- 
ferent algorithms. It would be worth creating an accurate 
graph representation of this overlap, in which the nodes 
are the connected algorithms if they share part of their 
community definition, some features of the input /output, 
some quality functions or a search space exploration ap- 
proach. This multidimensional complex network could be 
studied in order to have a clearer and more detailed view 
on the community discovery problem. 

Another contribution of this paper is the inclusion of 
the important innovative features of a graph partition al- 
gorithm, which has not considered in other reviews. The 
definition of different features is critical because clearly 
there is no "perfect method". However methods that are 
or are not able to consider multidimensionality, algorithms 
that do or do not treat overlapping communities, and so 
on, can be categorized as such. We have discussed this 
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point in each category, trying to highlight which features 
are naturally provided by each category and which ones are 
not. We chose to include novel features like multidimen- 
sionality, so far not considered by community discovery 
reviews, since they add a fundamental analytical power 
that better describes real world phenomena. Moreover, an 
approach is not necessarily better if it has a longer list of 
supported features: in some cases a specialized method can 
achieve a better performance than a general one. Thus we 
believe that Table 2 is useful for checking the features of 
all algorithms. We hope this will help analysts to find the 
desired algorithm also in terms of features and not only 
the underlying definition of community. 

To define and predict what will be the most important 
features in the future is another open question that we 
leave for future work. There is interest especially in multi- 
dimensionality 7^, 86, 101 . 52, H^], perceived as a feature 
that is part of the solution and not only as an input to be 
preprocessed. In other words, we want not only to con- 
sider multidimensionality as an input, but also to extract 
truly multidimensional communities. Another interesting 
feature might be the presence of both a hierarchical and 
overlapping organization of the community structure at 
the same time, since these two features are no longer seen 
as being mutually exclusive 51 [. 
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